Emotion regression using estimation of Valence and Arousal values in videos available in Aff-Wild database. We used 2 CNN based frameworks for this problem. One of the models used SeNet pre-trained on VGGFace database and fine-tuned the model on a subset of the Aff-Wild train data. The other model was a ResNet style CNN with CBAM attention module for refined feature extraction. This model was trained from scratch using the subset of Aff-Wild train data.
The hyper-parameters used for both the models are listed below:
Batch Size = 32
Optimizer = Adam
Learning Rate = Default
Epochs = 32
The train and validation root mean square error graphs of both frameworks are shown below.
CBAM Framework Transfer Learning Framework
The values of Valence and Arousal were used to find a categorical emotion using the 2D Emotion (Valence-Arousal) Wheel below.