In response to the problem that the traditional lower limb rehabilitation scale assessment method is time-consuming and difficult to use in exoskeleton rehabilitation training, this paper proposes a quantitative assessment method for lower limb walking ability based on lower limb exoskeleton robot training with multimodal synergistic information fusion. The method significantly improves the efficiency and reliability of the rehabilitation assessment process by introducing quantitative synergistic indicators fusing electrophysiological and kinematic level information. First, electromyographic and kinematic data of the lower extremity were collected from subjects trained to walk wearing an exoskeleton. Then, based on muscle synergy theory, a synergistic quantification algorithm was used to construct synergistic index features of electromyography and kinematics. Finally, the electrophysiological and kinematic level information was fused to build a modal feature fusion model and output the lower limb motor function score. The experimental results showed that the correlation coefficients of the constructed synergistic features of electromyography and kinematics with the clinical scale were 0.799 and 0.825, respectively. The results of the fused synergistic features in the K-nearest neighbor (KNN) model yielded higher correlation coefficients (r = 0.921, P < 0.01). This method can modify the rehabilitation training mode of the exoskeleton robot according to the assessment results, which provides a basis for the synchronized assessment-training mode of “human in the loop” and provides a potential method for remote rehabilitation training and assessment of the lower extremity.
In audiovisual emotion recognition, representational learning is a research direction receiving considerable attention, and the key lies in constructing effective affective representations with both consistency and variability. However, there are still many challenges to accurately realize affective representations. For this reason, in this paper we proposed a cross-modal audiovisual recognition model based on a multi-head cross-attention mechanism. The model achieved fused feature and modality alignment through a multi-head cross-attention architecture, and adopted a segmented training strategy to cope with the modality missing problem. In addition, a unimodal auxiliary loss task was designed and shared parameters were used in order to preserve the independent information of each modality. Ultimately, the model achieved macro and micro F1 scores of 84.5% and 88.2%, respectively, on the crowdsourced annotated multimodal emotion dataset of actor performances (CREMA-D). The model in this paper can effectively capture intra- and inter-modal feature representations of audio and video modalities, and successfully solves the unity problem of the unimodal and multimodal emotion recognition frameworks, which provides a brand-new solution to the audiovisual emotion recognition.