1. |
Scherer K R. Vocal communication of emotion: a review of research paradigms. Speech Communication, 2003, 40(1-2): 227-256.
|
2. |
陈晓宇, 王美馨, 高雅婷, 等. 视觉与听觉情绪信息关系判断中的交互作用. 心理科学, 2016, 39(4): 842-848.
|
3. |
张四平, 王梅, 邓华侔, 等. 远程医疗监护报警系统中的人脸表情识别算法研究. 信息与电脑(理论版), 2020, 32(14): 68-70.
|
4. |
薛雨丽, 毛峡, 郭叶, 等. 人机交互中的人脸表情识别研究进展. 中国图象图形学报, 2009, 14(5): 764-772.
|
5. |
张国雪. 多模态话语在线上、线下语法课中的对比研究. 社会科学前沿, 2023, 12(6): 2903-2911.
|
6. |
谢丽丽, 徐慧芳, 姜媛, 等. 新手和专家警察对犯罪嫌疑人面部和情绪躯体语言识别的 ERP 研究. 心理学探新, 2016, 36(6): 526-534.
|
7. |
Cai J, Meng Z, Khan A S, et al. Feature-level and model-level audiovisual fusion for emotion recognition in the wild//2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). San Jose, USA: IEEE, 2019: 443 - 448.
|
8. |
Ma Y, Hao Y, Chen M, et al. Audio-visual emotion fusion (AVEF): a deep efficient weighted approach. Information Fusion, 2019, 46: 184-192.
|
9. |
Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis//2016 IEEE 16th International Conference on Data Mining (ICDM). Barcelona, Spain: IEEE, 2016: 439-448.
|
10. |
胡婷婷, 沈凌洁, 冯亚琴, 等. 语音与文本情感识别中愤怒与开心误判分析. 计算机技术与发展, 2018, 28(11): 124-127,134.
|
11. |
Shoaib M, Haq S U, Shah M S, et al. Audio-visual emotion recognition using multilevel fusion. The Sciencetech, 2024, 5(1): 39-51.
|
12. |
Kollias D, Zafeiriou S. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. arXiv preprint, 2019, arXiv: 1910.04855.
|
13. |
Goncalves L, Leem S G, Lin W C, et al. Versatile audio-visual learning for handling single and multi modalities in emotion regression and classification tasks. arXiv preprint, 2023, arXiv: 2305.07216.
|
14. |
Huang N, Liu J, Luo Y, et al. Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person re-identification. Pattern Recognition, 2023, 135: 109145.
|
15. |
Chen F, Luo Z, Xu Y, et al. Complementary fusion of multi-features and multi-modalities in sentiment analysis. arXiv preprint, 2019, arXiv: 1904.08138.
|
16. |
Goncalves L, Busso C. AuxFormer: Robust approach to audiovisual emotion recognition//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, 2022: 7357-7361.
|
17. |
Praveen R G, Cardinal P, Granger E. Audio-visual fusion for emotion recognition in the valence-arousal space using joint cross-attention. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2023, 5(3): 360-373.
|
18. |
Hazarika D, Zimmermann R, Poria S. Misa: modality-invariant and-specific representations for multimodal sentiment analysis//Proceedings of the 28th ACM International Conference on Multimedia, Seattle: ACM, 2020: 1122-1131.
|
19. |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
|
20. |
Gong Y, Liu A H, Rouditchenko A, et al. UAVM: towards unifying audio and visual models. IEEE Signal Processing Letters, 2022, 29: 2437-2441.
|
21. |
Cao H, Cooper D G, Keutmann M K, et al. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Transactions on Affective Computing, 2014, 5(4): 377-390.
|
22. |
Zhang K, Zhang Z, Li Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.
|
23. |
Savchenko A V. EmotiEffNets for facial processing in video-based valence-arousal prediction, expression classification and action unit detection//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE/CVF, 2023: 5716-5724.
|
24. |
Mollahosseini A, Hasani B, Mahoor M H. Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 2017, 10(1): 18-31.
|
25. |
Baevski A, Zhou Y, Mohamed A, et al. Wav2vec2. 0: a framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 2020, 33: 12449-12460.
|
26. |
McFee B, Raffel C, Liang D, et al. Librosa: audio and music signal analysis in python// Proceeding of the 14th Python in Science Conference (SCIPY 2015), Austin: SciPy, 2015: 18-24.
|
27. |
Tsai Y H, Bai S, Liang P P, et al. Multimodal transformer for unaligned multimodal language sequences//Proceedings of the Conference Association for Computational Linguistics Meeting, Florence: Association for computational linguistics, 2019: 6558-6569.
|
28. |
Goncalves L, Busso C. Learning cross-modal audiovisual representations with ladder networks for emotion recognition//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island: IEEE, 2023: 1-5.
|
29. |
Mocanu B, Tapu R, Zaharia T. Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image and Vision Computing, 2023, 133: 104676.
|