Emotion recognition will be prosperious in multifarious applications, like distance education, healthcare, and human-computer interactions, etc. Emotions can be recognized from the behavior signals such as speech, facial expressions, gestures or the physiological signals such as electroencephalogram and electrocardiogram. Contrast to other methods, the physiological signals based emotion recognition can achieve more objective and effective results because it is almost impossible to be disguised. This paper introduces recent advancements in emotion research using physiological signals, specified to its emotion model, elicitation stimuli, feature extraction and classification methods. Finally the paper also discusses some research challenges and future developments.
Focused on the world-wide issue of improving the accuracy of emotion recognition, this paper proposes an electroencephalogram (EEG) signal feature extraction algorithm based on wavelet packet energy entropy and auto-regressive (AR) model. The auto-regressive process can be approached to EEG signal as much as possible, and provide a wealth of spectral information with few parameters. The wavelet packet entropy reflects the spectral energy distribution of the signal in each frequency band. Combination of them gives a better reflect of the energy characteristics of EEG signals. Feature extraction and fusion are implemented based on kernel principal component analysis. Six emotional states from a public multimodal database for emotion analysis using physiological signals (DEAP) are recognized. The results show that the recognition accuracy of the proposed algorithm is more than 90%, and the highest recognition accuracy is 99.33%. It indicates that this algorithm can extract the feature of EEG emotion well, and it is a kind of effective emotion feature extraction algorithm, providing support to emotion recognition.
Emotion plays an important role in people's cognition and communication. By analyzing electroencephalogram (EEG) signals to identify internal emotions and feedback emotional information in an active or passive way, affective brain-computer interactions can effectively promote human-computer interaction. This paper focuses on emotion recognition using EEG. We systematically evaluate the performance of state-of-the-art feature extraction and classification methods with a public-available dataset for emotion analysis using physiological signals (DEAP). The common random split method will lead to high correlation between training and testing samples. Thus, we use block-wise K fold cross validation. Moreover, we compare the accuracy of emotion recognition with different time window length. The experimental results indicate that 4 s time window is appropriate for sampling. Filter-bank long short-term memory networks (FBLSTM) using differential entropy features as input was proposed. The average accuracy of low and high in valance dimension, arousal dimension and combination of the four in valance-arousal plane is 78.8%, 78.4% and 70.3%, respectively. These results demonstrate the advantage of our emotion recognition model over the current studies in terms of classification accuracy. Our model might provide a novel method for emotion recognition in affective brain-computer interactions.
Emotion can reflect the psychological and physiological health of human beings, and the main expression of human emotion is voice and facial expression. How to extract and effectively integrate the two modes of emotion information is one of the main challenges faced by emotion recognition. In this paper, a multi-branch bidirectional multi-scale time perception model is proposed, which can detect the forward and reverse speech Mel-frequency spectrum coefficients in the time dimension. At the same time, the model uses causal convolution to obtain temporal correlation information between different scale features, and assigns attention maps to them according to the information, so as to obtain multi-scale fusion of speech emotion features. Secondly, this paper proposes a two-modal feature dynamic fusion algorithm, which combines the advantages of AlexNet and uses overlapping maximum pooling layers to obtain richer fusion features from different modal feature mosaic matrices. Experimental results show that the accuracy of the multi-branch bidirectional multi-scale time sensing dual-modal emotion recognition model proposed in this paper reaches 97.67% and 90.14% respectively on the two public audio and video emotion data sets, which is superior to other common methods, indicating that the proposed emotion recognition model can effectively capture emotion feature information and improve the accuracy of emotion recognition.