Objective To automatically segment diabetic retinal exudation features from deep learning color fundus images. Methods An applied study. The method of this study is based on the U-shaped network model of the Indian Diabetic Retinopathy Image Dataset (IDRID) dataset, introduces deep residual convolution into the encoding and decoding stages, which can effectively extract seepage depth features, solve overfitting and feature interference problems, and improve the model's feature expression ability and lightweight performance. In addition, by introducing an improved context extraction module, the model can capture a wider range of feature information, enhance the perception ability of retinal lesions, and perform excellently in capturing small details and blurred edges. Finally, the introduction of convolutional triple attention mechanism allows the model to automatically learn feature weights, focus on important features, and extract useful information from multiple scales. Accuracy, recall, Dice coefficient, accuracy and sensitivity were used to evaluate the ability of the model to detect and segment the automatic retinal exudation features of diabetic patients in color fundus images. Results After applying this method, the accuracy, recall, dice coefficient, accuracy and sensitivity of the improved model on the IDRID dataset reached 81.56%, 99.54%, 69.32%, 65.36% and 78.33%, respectively. Compared with the original model, the accuracy and Dice index of the improved model are increased by 2.35% , 3.35% respectively. Conclusion The segmentation method based on U-shaped network can automatically detect and segment the retinal exudation features of fundus images of diabetic patients, which is of great significance for assisting doctors to diagnose diseases more accurately.
The incidence of tinnitus is very high, which can affect the patient’s attention, emotion and sleep, and even cause serious psychological distress and suicidal tendency. Currently, there is no uniform and objective method for tinnitus detection and therapy, and the mechanism of tinnitus is still unclear. In this study, we first collected the resting state electroencephalogram (EEG) data of tinnitus patients and healthy subjects. Then the power spectrum topology diagrams were compared of in the band of δ (0.5–3 Hz), θ (4–7 Hz), α (8–13 Hz), β (14–30 Hz) and γ (31–50 Hz) to explore the central mechanism of tinnitus. A total of 16 tinnitus patients and 16 healthy subjects were recruited to participate in the experiment. The results of resting state EEG experiments found that the spectrum power value of tinnitus patients was higher than that of healthy subjects in all concerned frequency bands. The t-test results showed that the significant difference areas were mainly concentrated in the right temporal lobe of the θ and α band, and the temporal lobe, parietal lobe and forehead area of the β and γ band. In addition, we designed an attention-related task experiment to further study the relationship between tinnitus and attention. The results showed that the classification accuracy of tinnitus patients was significantly lower than that of healthy subjects, and the highest classification accuracies were 80.21% and 88.75%, respectively. The experimental results indicate that tinnitus may cause the decrease of patients’ attention.
Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of “visual attention” crucial, but the modeling of “question attention” is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model’s focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model’s accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.
In the clinical stage, suspected hemolytic plasma may cause hemolysis illness, manifesting as symptoms such as heart failure, severe anemia, etc. Applying a deep learning method to plasma images significantly improves recognition accuracy, so that this paper proposes a plasma quality detection model based on improved “You Only Look Once” 5th version (YOLOv5). Then the model presented in this paper and the evaluation system were introduced into the plasma datasets, and the average accuracy of the final classification reached 98.7%. The results of this paper's experiment were obtained through the combination of several key algorithm modules including omni-dimensional dynamic convolution, pooling with separable kernel attention, residual bi-fusion feature pyramid network, and re-parameterization convolution. The method of this paper obtains the feature information of spatial mapping efficiently, and enhances the average recognition accuracy of plasma quality detection. This paper presents a high-efficiency detection method for plasma images, aiming to provide a practical approach to prevent hemolysis illnesses caused by external factors.
Attention can concentrate our mental resources on processing certain interesting objects, which is an important mental behavior and cognitive process. Recognizing attentional states have great significance in improving human’s performance and reducing errors. However, it still lacks a direct and standardized way to monitor a person’s attentional states. Based on the fact that visual attention can modulate the steady-state visual evoked potential (SSVEP), we designed a go/no-go experimental paradigm with 10 Hz steady state visual stimulation in background to investigate the separability of SSVEP features modulated by different visual attentional states. The experiment recorded the EEG signals of 15 postgraduate volunteers under high and low visual attentional states. High and low visual attentional states are determined by behavioral responses. We analyzed the differences of SSVEP signals between the high and low attentional levels, and applied classification algorithms to recognize such differences. Results showed that the discriminant canonical pattern matching (DCPM) algorithm performed better compared with the linear discrimination analysis (LDA) algorithm and the canonical correlation analysis (CCA) algorithm, which achieved up to 76% in accuracy. Our results show that the SSVEP features modulated by different visual attentional states are separable, which provides a new way to monitor visual attentional states.
To assist grassroots sonographers in accurately and rapidly detecting intussusception lesions from children's abdominal ultrasound images, this paper proposes an improved YOLOv8n children's intussusception detection algorithm, called EMC-YOLOv8n. Firstly, the EfficientViT network with a cascaded group attention module was used as the backbone network to enhance the speed of target detection. Secondly, the improved C2fMBC module was used to replace the C2f module in the neck network to reduce network complexity, and the coordinate attention (CA) module was introduced after each C2fMBC module to enhance attention to positional information. Finally, experiments were conducted on the self-built dataset of intussusception in children. The results showed that the recall rate, average detection accuracy (mAP@0.5) and precision of the EMC-YOLOv8n algorithm improved by 3.9%, 2.1% and 0.9%, respectively, compared to the baseline algorithm. Despite slightly increased network parameters and computational load, significant improvements in detection accuracy enable efficient completion of detection tasks, demonstrating substantial economic and social value.
Although attention plays an important role in cognitive and perception, there is no simple way to measure one's attention abilities. We identified that the strength of brain functional network in sustained attention task can be used as the physiological indicator to predict behavioral performance. Behavioral and electroencephalogram (EEG) data from 14 subjects during three force control tasks were collected in this paper. The reciprocal of the product of force tolerance and variance were used to calculate the score of behavioral performance. EEG data were used to construct brain network connectivity by wavelet coherence method and then correlation analysis between each edge in connectivity matrices and behavioral score was performed. The linear regression model combined those with significantly correlated network connections into physiological indicator to predict participant's performance on three force control tasks, all of which had correlation coefficients greater than 0.7. These results indicate that brain functional network strength can provide a widely applicable biomarker for sustained attention tasks.
In audiovisual emotion recognition, representational learning is a research direction receiving considerable attention, and the key lies in constructing effective affective representations with both consistency and variability. However, there are still many challenges to accurately realize affective representations. For this reason, in this paper we proposed a cross-modal audiovisual recognition model based on a multi-head cross-attention mechanism. The model achieved fused feature and modality alignment through a multi-head cross-attention architecture, and adopted a segmented training strategy to cope with the modality missing problem. In addition, a unimodal auxiliary loss task was designed and shared parameters were used in order to preserve the independent information of each modality. Ultimately, the model achieved macro and micro F1 scores of 84.5% and 88.2%, respectively, on the crowdsourced annotated multimodal emotion dataset of actor performances (CREMA-D). The model in this paper can effectively capture intra- and inter-modal feature representations of audio and video modalities, and successfully solves the unity problem of the unimodal and multimodal emotion recognition frameworks, which provides a brand-new solution to the audiovisual emotion recognition.
Electrocardiogram (ECG) can visually reflect the physiological electrical activity of human heart, which is important in the field of arrhythmia detection and classification. To address the negative effect of label imbalance in ECG data on arrhythmia classification, this paper proposes a nested long short-term memory network (NLSTM) model for unbalanced ECG signal classification. The NLSTM is built to learn and memorize the temporal characteristics in complex signals, and the focal loss function is used to reduce the weights of easily identifiable samples. Then the residual attention mechanism is used to modify the assigned weights according to the importance of sample characteristic to solve the sample imbalance problem. Then the synthetic minority over-sampling technique is used to perform a simple manual oversampling process on the Massachusetts institute of technology and Beth Israel hospital arrhythmia (MIT-BIH-AR) database to further increase the classification accuracy of the model. Finally, the MIT-BIH arrhythmia database is applied to experimentally verify the above algorithms. The experimental results show that the proposed method can effectively solve the issues of imbalanced samples and unremarkable features in ECG signals, and the overall accuracy of the model reaches 98.34%. It also significantly improves the recognition and classification of minority samples and has provided a new feasible method for ECG-assisted diagnosis, which has practical application significance.
Attention deficit/hyperactivity disorder (ADHD) is a behavioral disorder syndrome found mainly in school-age population. At present, the diagnosis of ADHD mainly depends on the subjective methods, leading to the high rate of misdiagnosis and missed-diagnosis. To solve these problems, we proposed an algorithm for classifying ADHD objectively based on convolutional neural network. At first, preprocessing steps, including skull stripping, Gaussian kernel smoothing, et al., were applied to brain magnetic resonance imaging (MRI). Then, coarse segmentation was used for selecting the right caudate nucleus, left precuneus, and left superior frontal gyrus region. Finally, a 3 level convolutional neural network was used for classification. Experimental results showed that the proposed algorithm was capable of classifying ADHD and normal groups effectively, the classification accuracies obtained by the right caudate nucleus and the left precuneus brain regions were greater than the highest classification accuracy (62.52%) in the ADHD-200 competition, and among 3 brain regions in ADHD and the normal groups, the classification accuracy from the right caudate nucleus was the highest. It is well concluded that the method for classification of ADHD and normal groups proposed in this paper utilizing the coarse segmentation and deep learning is a useful method for the purpose. The classification accuracy of the proposed method is high, and the calculation is simple. And the method is able to extract the unobvious image features better, and can overcome the shortcomings of traditional methods of MRI brain area segmentation, which are time-consuming and highly complicate. The method provides an objective diagnosis approach for ADHD.