ObjectiveTo realize automatic risk bias assessment for the randomized controlled trial (RCT) literature using BERT (Bidirectional Encoder Representations from Transformers) as an approach for feature representation and text classification.MethodsWe first searched The Cochrane Library to obtain risk bias assessment data and detailed information on RCTs, and constructed data sets for text classification. We assigned 80% of the data set as the training set, 10% as the test set, and 10% as the validation set. Then, we used BERT to extract features, construct text classification model, and evaluate the seven types of risk bias values (high and low). The results were compared with those from traditional machine learning methods using a combination of n-gram and TF-IDF as well as the Linear SVM classifier. The accuracy rate (P value), recall rate (R value) and F1 value were used to evaluate the performance of the models.ResultsOur BERT-based model achieved F1 values of 78.5% to 95.2% for the seven types of risk bias assessment tasks, which was 14.7% higher than the traditional machine learning method. F1 values of 85.7% to 92.8% were obtained in the extraction task of the other six types of biased descriptors except "other sources of bias", which was 18.2% higher than the traditional machine learning method.ConclusionsThe BERT-based automatic risk bias assessment model can realize higher accuracy in risk of bias assessment for RCT literature, and improve the efficiency of assessment.
Traditional manual testing of ventilator performance is labor-intensive, time-consuming, and prone to errors in data recording, making it difficult to meet the current demands for testing efficiency in the development and manufacturing of ventilators. Therefore, in this study we designed an automated testing system for essential performance parameters of ventilators. The system mainly comprises a ventilator airflow analyzer, an automated switch module for simulated lungs, and a test control platform. Under the control of testing software, this system can perform automated tests of critical performance parameters of ventilators and generate a final test report. To validate the effectiveness of the designed system, tests were conducted on two different brands of ventilators under four different operating conditions, comparing tidal volume, oxygen concentration, and positive end expiratory pressure accuracy using both the automated testing system and traditional manual methods. Bland-Altman statistical analysis indicated good consistency between the accuracy of automated tests and manual tests for all respiratory parameters. In terms of testing efficiency, the automated testing system required approximately one-third of the time needed for manual testing. These results demonstrate that the designed automated testing system provides a novel approach and means for quality inspection and measurement calibration of ventilators, showing broad application prospects.
Objective To systematically review the accuracy and consistency of large language models (LLMs) in assessing risk of bias in analytical studies. Methods The cohort and case-control studies related to COVID-19 based on the team's published systematic review of clinical characteristics of COVID-19 were included. Two researchers independently screened the studies, extracted data, and assessed risk of bias of the included studies with the LLM-based BiasBee model (version Non-RCT) used for automated evaluation. Kappa statistics and score differences were used to analyze the agreement between LLM and human evaluations, with subgroup analysis for Chinese and English studies. Results A total of 210 studies were included. Meta-analysis showed that LLM scores were generally higher than those of human evaluators, particularly in representativeness of exposed cohorts (△=0.764) and selection of external controls (△=0.109). Kappa analysis indicated slight agreement in items such as exposure assessment (κ=0.059) and adequacy of follow-up (κ=0.093), while showing significant discrepancies in more subjective items, such as control selection (κ=−0.112) and non-response rate (κ=−0.115). Subgroup analysis revealed higher scoring consistency for LLMs in English-language studies compared to that of Chinese-language studies. Conclusion LLMs demonstrate potential in risk of bias assessment; however, notable differences remain in more subjective tasks. Future research should focus on optimizing prompt engineering and model fine-tuning to enhance LLM accuracy and consistency in complex tasks.