west china medical publishers
Keyword
  • Title
  • Author
  • Keyword
  • Abstract
Advance search
Advance search

Search

find Keyword "Representational learning" 1 results
  • Audiovisual emotion recognition based on a multi-head cross attention mechanism

    In audiovisual emotion recognition, representational learning is a research direction receiving considerable attention, and the key lies in constructing effective affective representations with both consistency and variability. However, there are still many challenges to accurately realize affective representations. For this reason, in this paper we proposed a cross-modal audiovisual recognition model based on a multi-head cross-attention mechanism. The model achieved fused feature and modality alignment through a multi-head cross-attention architecture, and adopted a segmented training strategy to cope with the modality missing problem. In addition, a unimodal auxiliary loss task was designed and shared parameters were used in order to preserve the independent information of each modality. Ultimately, the model achieved macro and micro F1 scores of 84.5% and 88.2%, respectively, on the crowdsourced annotated multimodal emotion dataset of actor performances (CREMA-D). The model in this paper can effectively capture intra- and inter-modal feature representations of audio and video modalities, and successfully solves the unity problem of the unimodal and multimodal emotion recognition frameworks, which provides a brand-new solution to the audiovisual emotion recognition.

    Release date: Export PDF Favorites Scan
1 pages Previous 1 Next

Format

Content