Objective: To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.
Methods: We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.
Results: 54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (P <0.05).
Conclusion: Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.
Keywords: AGREE II; LLM; RIGHT; consensus; day surgery; guideline; quality assessment.
© 2025 The Author(s). Journal of Evidence‐Based Medicine published by Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.