Authors: Dat Huynh, Ehsan Elhamifar Description: We address the problem of ﬁne-grained generalized zero-shot recognition of visually similar classes without training images for some classes. We propose a dense attribute-based attention mechanism that for each attribute focuses on the most relevant image regions, obtaining attribute-based features. Instead of aligning a global feature vector of an image with its associated class semantic vector, we propose an attribute embedding technique that aligns each attribute-based feature with its attribute semantic vector. Hence, we compute a vector of attribute scores, for the presence of each attribute in an image, whose similarity with the true class semantic vector is maximized. Moreover, we adjust each attribute score using an attention mechanism over attributes to better capture the discriminative power of different attributes. To tackle the challenge of bias towards seen classes during testing, we propose a new self-calibration loss that adjusts the probability of unseen classes to account for the training bias. We conduct experiments on three popular datasets of CUB, SUN and AWA2 as well as the large-scale DeepFashion dataset, showing that our model signiﬁcantly improves the state of the art.