首页 正文

Medical teacher. 2025 Jun 12:1-11. doi: 10.1080/0142159X.2025.2513419 Q14.42025

Advancing medical education in cervical cancer control with large language models for multiple-choice question generation

利用大型语言模型生成多选题以推进宫颈癌防治的医学教育 翻译改进

Mingyang Chen  1, Jiayi Ma  2, Xiaoli Cui  3, Qianling Dai  4, Haiyan Hu  5, Yijin Wu  1, Sulaiya Husaiyin  6, Aiyuan Wu  7, Youlin Qiao  1

作者单位 +展开

作者单位

  • 1 School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
  • 2 Tencent Sustainable Social Value Inclusive Health Lab, Tencent, Beijing, China.
  • 3 Department of Gynecologic Oncology, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, Shenyang, Liaoning Province, China.
  • 4 Department of Diagnosis and Treatment for Cervical Diseases, Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China.
  • 5 Department of Gynecology, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, Guangdong Province, China.
  • 6 Department of Gynecology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, China.
  • 7 Wuxi Maternity and Child Health Care Hospital, Wuxi School of Medicine, Jiangnan University, Wuxi, Jiangsu Province, China.
  • DOI: 10.1080/0142159X.2025.2513419 PMID: 40504493

    摘要 中英对照阅读

    Objective: To explore the feasibility of using large language models (LLMs) to generate multiple-choice questions (MCQs) for cervical cancer control education and compare them with those created by clinicians. Methods: GPT-4o and Baichuan4 generated 40 MCQs each with iteratively refined prompts. Clinicians generated 40 MCQs for comparison. 1... ...点击完成人机验证后继续浏览

    目标:探索使用大型语言模型(LLM)生成宫颈癌控制教育的多项选择题的可能性,并将其与临床医生创建的题目进行比较。

    方法:GPT-4o 和 Baichuan4 各自通过迭代优化提示语生成了 40 道多项选择题。临床医生也生成了 40 道用于对比的多项选择题。120 道题目由 12 名专家从五个维度(正确性、清晰度和具体性、认知水平、临床相关性和可解释性)进行评估,使用五分 Likert 量表打分。难度和鉴别力则通过从业人员进行测试。参与者被要求识别每道多项选择题的来源。

    结果:自动化生成的多项选择题在大多数维度上与临床医生生成的题目相似,但临床医生生成的题目在认知水平(4.00±1.08)方面高于 GPT-4o(3.68±1.07)和 Baichuan4(3.7±1.13)。通过 312 名从业人员进行测试的结果显示,三者在难度和鉴别力上无显著差异:临床医生组为(59.51±24.50, 0.38±0.14),GPT-4o 组为(61.89±25.36, 0.30±0.19),Baichuan4 组为(59.79±26.25, 0.33±0.15)。大型语言模型生成的多项选择题被识别来源的比例在 32% 到 50% 之间,专家比普通执业医生更能准确地判断出题目制作者。

    结论:通过优化提示语,大型语言模型可以生成与临床医生生成的多项选择题相当的质量。尽管在认知水平上临床医生表现更好,但借助大型语言模型辅助生成多项选择题可以提高效率,并且需要严格的验证以确保教育质量。

    关键词:大型语言模型;宫颈癌;医学教育;多项选择题生成。

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Medical teacher. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Medical teacher

    缩写:MED TEACH

    ISSN:0142-159X

    e-ISSN:1466-187X

    IF/分区:4.4/Q1

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Advancing medical education in cervical cancer control with large language models for multiple-choice question generation