Recently, deep learning has achieved impressive results in medical image tasks. However, this method usually requires large-scale annotated data, and medical images are expensive to annotate, so it is a challenge to learn efficiently from the limited annotated data. Currently, the two commonly used methods are transfer learning and self-supervised learning. However, these two methods have been little studied in multimodal medical images, so this study proposes a contrastive learning method for multimodal medical images. The method takes images of different modalities of the same patient as positive samples, which effectively increases the number of positive samples in the training process and helps the model to fully learn the similarities and differences of lesions on images of different modalities, thus improving the model's understanding of medical images and diagnostic accuracy. The commonly used data augmentation methods are not suitable for multimodal images, so this paper proposes a domain adaptive denormalization method to transform the source domain images with the help of statistical information of the target domain. In this study, the method is validated with two different multimodal medical image classification tasks: in the microvascular infiltration recognition task, the method achieves an accuracy of (74.79 ± 0.74)% and an F1 score of (78.37 ± 1.94)%, which are improved as compared with other conventional learning methods; for the brain tumor pathology grading task, the method also achieves significant improvements. The results show that the method achieves good results on multimodal medical images and can provide a reference solution for pre-training multimodal medical images.
In order to overcome the difficulty in lung parenchymal segmentation due to the factors such as lung disease and bronchial interference, a segmentation algorithm for three-dimensional lung parenchymal is presented based on the integration of surfacelet transform and pulse coupled neural network (PCNN). First, the three-dimensional computed tomography of lungs is decomposed into surfacelet transform domain to obtain multi-scale and multi-directional sub-band information. The edge features are then enhanced by filtering sub-band coefficients using local modified Laplacian operator. Second, surfacelet inverse transform is implemented and the reconstructed image is fed back to the input of PCNN. Finally, iteration process of the PCNN is carried out to obtain final segmentation result. The proposed algorithm is validated on the samples of public dataset. The experimental results demonstrate that the proposed algorithm has superior performance over that of the three-dimensional surfacelet transform edge detection algorithm, the three-dimensional region growing algorithm, and the three-dimensional U-NET algorithm. It can effectively suppress the interference coming from lung lesions and bronchial, and obtain a complete structure of lung parenchyma.
Medical image fusion realizes advantage integration of functional images and anatomical images. This article discusses the research progress of multi-model medical image fusion at feature level. We firstly describe the principle of medical image fusion at feature level. Then we analyze and summarize fuzzy sets, rough sets, D-S evidence theory, artificial neural network, principal component analysis and other fusion methods' applications in medical image fusion and get summery. Lastly, we in this article indicate present problems and the research direction of multi-model medical images in the future.
With the development of social economy and medicine, degenerative heart valve disease has become the major part in heart valve disease. Calcific aortic valve disease (CAVD) is one of the most representative manifestations of degenerative valvular disease. Aortic valve calcification (AVC) has been found to be a strong predictor of major cardiovascular events, which makes it necessary to identify an effective way to evaluate the degree of AVC. Numerous methods of quantitative assessment of AVC have been reported. Here, we discuss these methods from the aspects of pathology and imageology.
To address the issues of difficulty in preserving anatomical structures, low realism of generated images, and loss of high-frequency image information in medical image cross-modal translation, this paper proposes a medical image cross-modal translation method based on diffusion generative adversarial networks. First, an unsupervised translation module is used to convert magnetic resonance imaging (MRI) into pseudo-computed tomography (CT) images. Subsequently, a nonlinear frequency decomposition module is used to extract high-frequency CT images. Finally, the pseudo-CT image is input into the forward process, while the high-frequency CT image as a conditional input is used to guide the reverse process to generate the final CT image. The proposed model is evaluated on the SynthRAD2023 dataset, which is used for CT image generation for radiotherapy planning. The generated brain CT images achieve a Fréchet Inception Distance (FID) score of 33.159 7, a structure similarity index measure (SSIM) of 89.84%, a peak signal-to-noise ratio (PSNR) of 35.596 5 dB, and a mean squared error (MSE) of 17.873 9. The generated pelvic CT images yield an FID score of 33.951 6, a structural similarity index of 91.30%, a PSNR of 34.870 7 dB, and an MSE of 17.465 8. Experimental results show that the proposed model generates highly realistic CT images while preserving anatomical accuracy as much as possible. The transformed CT images can be effectively used in radiotherapy planning, further enhancing diagnostic efficiency.
To locate the nuclei in hematoxylin-eosin (HE) stained section images more simply, efficiently and accurately, a new method based on distance estimation is proposed in this paper, which shows a new mind on locating the nuclei from a clump image. Different from the mainstream methods, proposed method avoids the operations of searching the combined singles. It can directly locate the nuclei in a full image. Furthermore, when the distance estimation built on the matrix sequence of distance rough estimating (MSDRE) is combined with the fact that a center of a convex region must have the farthest distance to the boundary, it can fix the positions of nuclei quickly and precisely. In addition, a high accuracy and efficiency are achieved by this method in experiments, with the precision of 95.26% and efficiency of 1.54 second per thousand nuclei, which are better than the mainstream methods in recognizing nucleus clump samples. Proposed method increases the efficiency of nuclear location while maintaining the location's accuracy. This can be helpful for the automatic analysis system of HE images by improving the real-time performance and promoting the application of related researches.
Medical image registration is very challenging due to the various imaging modality, image quality, wide inter-patients variability, and intra-patient variability with disease progressing of medical images, with strict requirement for robustness. Inspired by semantic model, especially the recent tremendous progress in computer vision tasks under bag-of-visual-word framework, we set up a novel semantic model to match medical images. Since most of medical images have poor contrast, small dynamic range, and involving only intensities and so on, the traditional visual word models do not perform very well. To benefit from the advantages from the relative works, we proposed a novel visual word model named directional visual words, which performs better on medical images. Then we applied this model to do medical registration. In our experiment, the critical anatomical structures were first manually specified by experts. Then we adopted the directional visual word, the strategy of spatial pyramid searching from coarse to fine, and the k-means algorithm to help us locating the positions of the key structures accurately. Sequentially, we shall register corresponding images by the areas around these positions. The results of the experiments which were performed on real cardiac images showed that our method could achieve high registration accuracy in some specific areas.
Coronavirus disease 2019 (COVID-19) has spread rapidly around the world. In order to diagnose COVID-19 more quickly, in this paper, a depthwise separable DenseNet was proposed. The paper constructed a deep learning model with 2 905 chest X-ray images as experimental dataset. In order to enhance the contrast, the contrast limited adaptive histogram equalization (CLAHE) algorithm was used to preprocess the X-ray image before network training, then the images were put into the training network and the parameters of the network were adjusted to the optimal. Meanwhile, Leaky ReLU was selected as the activation function. VGG16, ResNet18, ResNet34, DenseNet121 and SDenseNet models were used to compare with the model proposed in this paper. Compared with ResNet34, the proposed classification model of pneumonia had improved 2.0%, 2.3% and 1.5% in accuracy, sensitivity and specificity respectively. Compared with the SDenseNet network without depthwise separable convolution, number of parameters of the proposed model was reduced by 43.9%, but the classification effect did not decrease. It can be found that the proposed DWSDenseNet has a good classification effect on the COVID-19 chest X-ray images dataset. Under the condition of ensuring the accuracy as much as possible, the depthwise separable convolution can effectively reduce number of parameters of the model.
Objective To develop an automatic diagnostic tool based on deep learning for lumbar spine stability and validate diagnostic accuracy. Methods Preoperative lumbar hyper-flexion and hyper-extension X-ray films were collected from 153 patients with lumbar disease. The following 5 key points were marked by 3 orthopedic surgeons: L4 posteroinferior, anterior inferior angles as well as L5 posterosuperior, anterior superior, and posterior inferior angles. The labeling results of each surgeon were preserved independently, and a total of three sets of labeling results were obtained. A total of 306 lumbar X-ray films were randomly divided into training (n=156), validation (n=50), and test (n=100) sets in a ratio of 3∶1∶2. A new neural network architecture, Swin-PGNet was proposed, which was trained using annotated radiograph images to automatically locate the lumbar vertebral key points and calculate L4, 5 intervertebral Cobb angle and L4 lumbar sliding distance through the predicted key points. The mean error and intra-class correlation coefficient (ICC) were used as an evaluation index, to compare the differences between surgeons’ annotations and Swin-PGNet on the three tasks (key point positioning, Cobb angle measurement, and lumbar sliding distance measurement). Meanwhile, the change of Cobb angle more than 11° was taken as the criterion of lumbar instability, and the lumbar sliding distance more than 3 mm was taken as the criterion of lumbar spondylolisthesis. The accuracy of surgeon annotation and Swin-PGNet in judging lumbar instability was compared. Results ① Key point: The mean error of key point location by Swin-PGNet was (1.407±0.939) mm, and by different surgeons was (3.034±2.612) mm. ② Cobb angle: The mean error of Swin-PGNet was (2.062±1.352)° and the mean error of surgeons was (3.580±2.338)°. There was no significant difference between Swin-PGNet and surgeons (P>0.05), but there was a significant difference between different surgeons (P<0.05). ③ Lumbar sliding distance: The mean error of Swin-PGNet was (1.656±0.878) mm and the mean error of surgeons was (1.884±1.612) mm. There was no significant difference between Swin-PGNet and surgeons and between different surgeons (P>0.05). The accuracy of lumbar instability diagnosed by surgeons and Swin-PGNet was 75.3% and 84.0%, respectively. The accuracy of lumbar spondylolisthesis diagnosed by surgeons and Swin-PGNet was 70.7% and 71.3%, respectively. There was no significant difference between Swin-PGNet and surgeons, as well as between different surgeons (P>0.05). ④ Consistency of lumbar stability diagnosis: The ICC of Cobb angle among different surgeons was 0.913 [95%CI (0.898, 0.934)] (P<0.05), and the ICC of lumbar sliding distance was 0.741 [95%CI (0.729, 0.796)] (P<0.05). The result showed that the annotating of the three surgeons were consistent. The ICC of Cobb angle between Swin-PGNet and surgeons was 0.922 [95%CI (0.891, 0.938)] (P<0.05), and the ICC of lumbar sliding distance was 0.748 [95%CI(0.726, 0.783)] (P<0.05). The result showed that the annotating of Swin-PGNet were consistent with those of surgeons. ConclusionThe automatic diagnostic tool for lumbar instability constructed based on deep learning can realize the automatic identification of lumbar instability and spondylolisthesis accurately and conveniently, which can effectively assist clinical diagnosis.
Recent years, convolutional neural network (CNN) is a research hot spot in machine learning and has some application value in computer aided diagnosis. Firstly, this paper briefly introduces the basic principle of CNN. Secondly, it summarizes the improvement on network structure from two dimensions of model and structure optimization. In model structure, it summarizes eleven classical models about CNN in the past 60 years, and introduces its development process according to timeline. In structure optimization, the research progress is summarized from five aspects (input layer, convolution layer, down-sampling layer, full-connected layer and the whole network) of CNN. Thirdly, the learning algorithm is summarized from the optimization algorithm and fusion algorithm. In optimization algorithm, it combs the progress of the algorithm according to optimization purpose. In algorithm fusion, the improvement is summarized from five angles: input layer, convolution layer, down-sampling layer, full-connected layer and output layer. Finally, CNN is mapped into the medical image domain, and it is combined with computer aided diagnosis to explore its application in medical images. It is a good summary for CNN and has positive significance for the development of CNN.