In the field of artificial intelligence (AI) medical imaging, data annotation is a key factor in all AI development. In the traditional manual annotation process, there are prominent problems such as difficult data acquisition, high manual labor intensity, strong professionalism and low labeling quality. Therefore, an intelligent multimodal medical image annotation system is urgently needed to meet the requirements of labeling. Based on the image cloud, West China Hospital of Sichuan University collected the multimodal image data of hospital and allied hospitals, and designed a multi-modal image annotation system through information technology, which integrated various image processing algorithms and AI models to simplify the image data annotation. With the construction of annotation system, the efficiency of data labeling in the hospitals is improved, which provides necessary data support for the AI image research and related industry construction in the hospital, so as to promote the implementation of artificial intelligence industry related to medical images in the hospital.
Magnetic resonance imaging(MRI) can obtain multi-modal images with different contrast, which provides rich information for clinical diagnosis. However, some contrast images are not scanned or the quality of the acquired images cannot meet the diagnostic requirements due to the difficulty of patient's cooperation or the limitation of scanning conditions. Image synthesis techniques have become a method to compensate for such image deficiencies. In recent years, deep learning has been widely used in the field of MRI synthesis. In this paper, a synthesis network based on multi-modal fusion is proposed, which firstly uses a feature encoder to encode the features of multiple unimodal images separately, and then fuses the features of different modal images through a feature fusion module, and finally generates the target modal image. The similarity measure between the target image and the predicted image in the network is improved by introducing a dynamic weighted combined loss function based on the spatial domain and K-space domain. After experimental validation and quantitative comparison, the multi-modal fusion deep learning network proposed in this paper can effectively synthesize high-quality MRI fluid-attenuated inversion recovery (FLAIR) images. In summary, the method proposed in this paper can reduce MRI scanning time of the patient, as well as solve the clinical problem of missing FLAIR images or image quality that is difficult to meet diagnostic requirements.
Sleep stage classification is essential for clinical disease diagnosis and sleep quality assessment. Most of the existing methods for sleep stage classification are based on single-channel or single-modal signal, and extract features using a single-branch, deep convolutional network, which not only hinders the capture of the diversity features related to sleep and increase the computational cost, but also has a certain impact on the accuracy of sleep stage classification. To solve this problem, this paper proposes an end-to-end multi-modal physiological time-frequency feature extraction network (MTFF-Net) for accurate sleep stage classification. First, multi-modal physiological signal containing electroencephalogram (EEG), electrocardiogram (ECG), electrooculogram (EOG) and electromyogram (EMG) are converted into two-dimensional time-frequency images containing time-frequency features by using short time Fourier transform (STFT). Then, the time-frequency feature extraction network combining multi-scale EEG compact convolution network (Ms-EEGNet) and bidirectional gated recurrent units (Bi-GRU) network is used to obtain multi-scale spectral features related to sleep feature waveforms and time series features related to sleep stage transition. According to the American Academy of Sleep Medicine (AASM) EEG sleep stage classification criterion, the model achieved 84.3% accuracy in the five-classification task on the third subgroup of the Institute of Systems and Robotics of the University of Coimbra Sleep Dataset (ISRUC-S3), with 83.1% macro F1 score value and 79.8% Cohen’s Kappa coefficient. The experimental results show that the proposed model achieves higher classification accuracy and promotes the application of deep learning algorithms in assisting clinical decision-making.
Emotion classification and recognition is a crucial area in emotional computing. Physiological signals, such as electroencephalogram (EEG), provide an accurate reflection of emotions and are difficult to disguise. However, emotion recognition still faces challenges in single-modal signal feature extraction and multi-modal signal integration. This study collected EEG, electromyogram (EMG), and electrodermal activity (EDA) signals from participants under three emotional states: happiness, sadness, and fear. A feature-weighted fusion method was applied for integrating the signals, and both support vector machine (SVM) and extreme learning machine (ELM) were used for classification. The results showed that the classification accuracy was highest when the fusion weights were set to EEG 0.7, EMG 0.15, and EDA 0.15, achieving accuracy rates of 80.19% and 82.48% for SVM and ELM, respectively. These rates represented an improvement of 5.81% and 2.95% compared to using EEG alone. This study offers methodological support for emotion classification and recognition using multi-modal physiological signals.