Objective To systematically review the accuracy and consistency of large language models (LLM) in assessing risk of bias in analytical studies. Methods The cohort and case-control studies related to COVID-19 based on the team's published systematic review of clinical characteristics of COVID-19 were included. Two researchers independently screened the studies, extracted data, and assessed risk of bias of the included studies with the LLM-based BiasBee model (version Non-RCT) used for automated evaluation. Kappa statistics and score differences were used to analyze the agreement between LLM and human evaluations, with subgroup analysis for Chinese and English studies. Results A total of 210 studies were included. Meta-analysis showed that LLM scores were generally higher than those of human evaluators, particularly in representativeness of exposed cohorts (△=0.764) and selection of external controls (△=0.109). Kappa analysis indicated slight agreement in items such as exposure assessment (κ=0.059) and adequacy of follow-up (κ=0.093), while showing significant discrepancies in more subjective items, such as control selection (κ=−0.112) and non-response rate (κ=−0.115). Subgroup analysis revealed higher scoring consistency for LLM in English-language studies compared to that of Chinese-language studies. Conclusion LLM demonstrate potential in risk of bias assessment; however, notable differences remain in more subjective tasks. Future research should focus on optimizing prompt engineering and model fine-tuning to enhance LLM accuracy and consistency in complex tasks.
Nonrandomized studies are an important method for evaluating the effects of exposures (including environmental, occupational, and behavioral exposures) on human health. Risk of bias in nonrandomized studies of exposures (ROBINS-E) is used to evaluate the risk of bias in natural or occupational exposure observational studies. This paper introduces the main contents of ROBINS-E 2022, including backgrounds, seven domains, signal questions and the operation process.
The COSMIN-RoB checklist includes three sections with a total of 10 boxes, which is used to evaluate risk of bias of studies on content validity, internal structure, and other measurement properties. COSMIN classifies reliability, measurement error, criteria validity, hypothesis testing for construct validity, and responsiveness as other measurement properties, which primarily focus on the quality of the (sub)scale as a whole, rather than on the item level. Among the five measurement properties, reliability, measurement error and criteria validity are the most widely used in the studies. Therefore, this paper aims to interpret COSMIN-RoB checklist with examples to guide researchers to evaluate the risk of bias of the studies on reliability, measurement error and criteria validity of PROMs.
The COSMIN community updated the COSMIN-RoB checklist on reliability and measurement error in 2021. The updated checklist can be applied to the assessment of all types of outcome measurement studies, including clinician-reported outcome measures (ClinPOMs), performance-basd outcome measurement instruments (PerFOMs), and laboratory values. In order to help readers better understand and apply the updated COSMIN-RoB checklist and provide methodological references for conducting systematic reviews of ClinPOMs, PerFOMs and laboratory values, this paper aimed to interpret the updated COSMIN-RoB checklist on reliability and measurement error studies.
Measurement properties studies of patient-reported outcome measures (PROMs) aims to validate the measurement properties of PROMs. In the process of designing and statistical analysis of these measurement properties studies, bias will occur if there are any defects, which will affect the quality of PROMs. Therefore, the COSMIN (consensus-based standards for the selection of health measurement instruments) team has developed the COSMIN risk of bias (COSMIN-RoB) checklist to evaluate risk of bias of studies on measurement properties of PROMs. The checklist can be used to develop systematic reviews of PROMs measurement properties, and for PROMs developers, it can also be used to guide the research design in the measurement tool development process for reducing bias. At present, similar assessment tools are lacking in China. Therefore, this article aims to introduce the primary contents of COSMIN-RoB checklist and to interpret how to evaluate risk of bias of the internal structure studies of PROMs with examples.
ObjectiveTo interpret ROBIS, a new tool to evaluate the risk of bias in systematic reviews, to promote the comprehension of it and its proper application. MethodsWe explained each item of ROBIS tool, used it to evaluate the risk of bias of a selected intervention review whose title was Cyclophosphamide for Primary Nephrotic Syndrome of Children: A Systematic Review, and judged the risk of bias in the review. ResultsThe selected systematic review as a whole was rated as “high risk of bias”, because there existed high risk of bias in domain 2 to 4, namely identification and selection of studies, data collection and study appraisal, synthesis and findings. The risk of bias in domain 1 (study eligibility criteria) was low. The relevance of identified studies and the review’s research question was appropriately considered and the reviewers avoided emphasizing results on the basis of their statistical significance. ConclusionROBIS is a new tool worthy of being recommended to evaluate risk of bias in systematic reviews. Reviewers should use ROBIS items as standards to conduct and produce high quality systematic reviews.
With the rapid development of artificial intelligence (AI) and machine learning technologies, the development of AI-based prediction models has become increasingly prevalent in the medical field. However, the PROBAST tool, which is used to evaluate prediction models, has shown growing limitations when assessing models built on AI technologies. Therefore, Moons and colleagues updated and expanded PROBAST to develop the PROBAST+AI tool. This tool is suitable for evaluating prediction model studies based on both artificial intelligence methods and regression methods. It covers four domains: participants and data sources, predictors, outcomes, and analysis, allowing for systematic assessment of quality in model development, risk of bias in model evaluation, and applicability. This article interprets the content and evaluation process of the PROBAST+AI tool, aiming to provide references and guidance for domestic researchers using this tool.
The risk of bias assessment tool 2.0 (RoB 2.0) for analyzing cluster randomized trials and crossover trials (revised version 2021) has been updated. The current paper briefly delineates the history of the RoB 2.0 tool and includes an explanation and interpretation of the updated contents and software operation process for use with cluster randomized trials and crossover trials. Compared with the previous versions, the updated RoB 2.0 tool (revised version 2021) has the advantage of precise language and is easily understood. Thus, the updated RoB 2.0 tool merits popularization and further general application.
This study aims to introduce how to use the PROBAST (prediction model risk of bias assessment tool) to evaluate risk of bias and applicability of the study of diagnostic or prognostic predictive models, including the introduction of the background, the scope of application and use of the tool. This tool mainly involves the four areas of participants, predictors, outcomes and analyses. The risk of bias in the research is evaluated through the four areas, while the applicability is evaluated in the first three. PROBAST provides a standardized approach to evaluate the critical appraisal of the study of diagnostic or prognostic predictive models, which screens qualified literature for data analysis and helps to establish a scientific basis for clinical decision-making.
High-quality randomized controlled trials are the best source of evidence to explain the relationship between health interventions and outcomes. However, in cases where they are insufficient, indirect, or inappropriate, researchers may need to include non-randomized studies of interventions to strengthen the evidence body and improve the certainty (quality) of evidence. The latest research from the GRADE working group provides a way for researchers to integrate randomized and non-randomized evidence. The present paper introduced the relevant methods to provide guidance for systematic reviewers, health technology assessors, and guideline developers.