Abstract: Most existing works regarding facial demographic estimation are focused on still image datasets, although nowadays the need to analyze video content in real applications is increasing. We propose to tackle gender, age and ethnicity estimation in the context of video scenarios. Our main contribution is to use an attribute-specific quality assessment procedure to select most relevant frames from a video sequence for each of the three demographic modalities. Selected frames are classified with fine-tuned MobileNet models and a final video prediction is obtained with a majority voting strategy. Our validation on three different datasets and our comparison with state-of-theart models, show the effectiveness of the proposed demographic classifiers and the quality pipeline, which allows to reduce both: the number of frames to be classified and the processing time in practical applications; and improves the soft biometrics prediction accuracy.
Authors: Becerra-Riera, Fabiola; Morales-González, Annette; Mendez-Vazquez, Heydi; Dugelay, Jean-Luc (Advanced Technologies Application Center)