Journal of evidence-based medicine. 2025 Mar;18(1):e70017. doi: 10.1111/jebm.70017 Q13.62024

From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models

从人工到机器：使用大型语言模型革新日间手术指南和共识的质量评估翻译改进

Xingyu Wan¹, Ruiyan Wang¹, Junxian Zhao², Tianhu Liang², Bingyi Wang³, Jie Zhang³, Yujia Liu¹, Yan Ma¹, Yaolong Chen^{2 3 4}, Xinghua Lv^{1 5}

作者单位 +展开

作者单位

¹ The First School of Clinical Medicine, Lanzhou University, Lanzhou, China.

² Research Center for Clinical Medicine, the First Hospital of Lanzhou University, Lanzhou, China.

³ School of Basic Medical Sciences, Lanzhou University, Lanzhou, China.

⁴ Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China.

⁵ Day Surgery Center, the First Hospital of Lanzhou University, Lanzhou, China.

DOI: 10.1111/jebm.70017 PMID: 40123109

摘要 Ai翻译

Objective: To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.

Methods: We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.

Results: 54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (P <0.05).

Conclusion: Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.

Keywords: AGREE II; LLM; RIGHT; consensus; day surgery; guideline; quality assessment.

Keywords：Day Surgery Guideline; Quality Assessment; Large Language Models

关键词：日间手术指南; 质量评估; 大型语言模型

收藏本文留言帮助扫码分享

相关内容

期刊名：Journal of evidence-based medicine

缩写：

ISSN：1756-5383

e-ISSN：1756-5391

IF/分区：3.6/Q1

文章目录更多期刊信息

全文链接

官方链接

PMC全文

引文链接

复制

已复制！

格式：

From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models

从人工到机器：使用大型语言模型革新日间手术指南和共识的质量评估翻译改进

Streamlining Systematic Reviews: Harnessing Large Language Models for Quality Assessment and Risk-of-Bias Evaluation

利用大型语言模型简化系统评价的质量评估和偏倚风险评价

A new generation of patient-reported outcome measures with large language models

利用大型语言模型的新一代患者报告结果测量工具

Leveraging large language models: transforming scholarly publishing for the better

利用大型语言模型：让学术出版变得更好

Evaluating large language models in analysing classroom dialogue

评估大型语言模型在分析课堂对话方面的性能

DracoGPT: Extracting Visualization Design Preferences from Large Language Models

DracoGPT：从大型语言模型中提取可视化设计偏好

Efficacy of large language models and their potential in Obstetrics and Gynecology education

大型语言模型在妇产科教育中的有效性和潜在应用价值

Use of Large Language Models to Identify Surveillance Colonoscopy Intervals-A Feasibility Study

利用大型语言模型识别监视结肠镜检查间隔的可行性研究

Why I'm committed to breaking the bias in large language models

为什么我致力于打破大型语言模型中的偏见

From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models

从人工到机器：使用大型语言模型革新日间手术指南和共识的质量评估 翻译改进

Streamlining Systematic Reviews: Harnessing Large Language Models for Quality Assessment and Risk-of-Bias Evaluation

利用大型语言模型简化系统评价的质量评估和偏倚风险评价

A new generation of patient-reported outcome measures with large language models

利用大型语言模型的新一代患者报告结果测量工具

Leveraging large language models: transforming scholarly publishing for the better

利用大型语言模型：让学术出版变得更好

Evaluating large language models in analysing classroom dialogue

评估大型语言模型在分析课堂对话方面的性能

DracoGPT: Extracting Visualization Design Preferences from Large Language Models

DracoGPT：从大型语言模型中提取可视化设计偏好

Efficacy of large language models and their potential in Obstetrics and Gynecology education

大型语言模型在妇产科教育中的有效性和潜在应用价值

Use of Large Language Models to Identify Surveillance Colonoscopy Intervals-A Feasibility Study

利用大型语言模型识别监视结肠镜检查间隔的可行性研究

Why I'm committed to breaking the bias in large language models

为什么我致力于打破大型语言模型中的偏见

从人工到机器：使用大型语言模型革新日间手术指南和共识的质量评估翻译改进