首页 正文

Journal of biomedical informatics. 2025 Jun 5:104854. doi: 10.1016/j.jbi.2025.104854 Q24.02024

Monitoring strategies for continuous evaluation of deployed clinical prediction models

临床预测模型连续评估的监测策略 翻译改进

Grace Y E Kim  1, Conor K Corbin  2, François Grolleau  3, Michael Baiocchi  4, Jonathan H Chen  5

作者单位 +展开

作者单位

  • 1 Center for Biomedical Informatics Research, Stanford, California, USA. Electronic address: yek1354@stanford.edu.
  • 2 Center for Biomedical Informatics Research, Stanford, California, USA; Department of Biomedical Data Science, Stanford, California, USA.
  • 3 Center for Biomedical Informatics Research, Stanford, California, USA.
  • 4 Department of Epidemiology and Population Health, Stanford, California, USA.
  • 5 Center for Biomedical Informatics Research, Stanford, California, USA; Division of Hospital Medicine, Stanford, Stanford, California, USA; Clinical Excellence Research Center, Stanford, California, USA.
  • DOI: 10.1016/j.jbi.2025.104854 PMID: 40482691

    摘要 中英对照阅读

    Objective: As machine learning adoption in clinical practice continues to grow, deployed classifiers must be continuously monitored and updated (retrained) to protect against data drift that stems from inevitable changes, including evolving medical practices and shifting patient populations. However, successful clinical machine learning classifiers will lead to a change in care which may change the distribution of features, labels, and their relationship. For example, "high risk" cases that were correctly identified by the model may ultimately get labeled as "low risk" thanks to an intervention prompted by the model's alert. Classifier surveillance systems naive to such deployment-induced feedback loops will estimate lower model performance and lead to degraded future classifier retrains. The objective of this study is to simulate the impact of these feedback loops, propose feedback aware monitoring strategies as a solution, and assess the performance of these alternative monitoring strategies through simulations.

    Methods: We propose Adherence Weighted and Sampling Weighted Monitoring as two feedback loop-aware surveillance strategies. Through simulation we evaluate their ability to accurately appraise post deployment model performance and to initiate safe and accurate classifier retraining.

    Results: Measured across accuracy, area under the receiver operating characteristic curve, average precision, brier score, expected calibration error, F1, precision, sensitivity, and specificity, in the presence of feedback loops, Adherence Weighted and Sampling Weighted strategies have the highest fidelity to the ground truth classifier performance while standard approaches yield the most inaccurate estimations. Furthermore, in simulations with true data drift, retraining using standard unweighted approaches results in a AUROC score of 0.52 (drop from 0.72). In contrast, retraining based on Adherence Weighted and Sampling Weighted strategies recover performance to 0.67 which is comparable to what a new model trained from scratch on the existing and shifted data would obtain.

    Conclusion: Compared to standard approaches, Adherence Weighted and Sampling Weighted strategies yield more accurate classifier performance estimates, measured according to the no-treatment potential outcome. Retraining based on these strategies bring stronger performance recovery when tested against data drift and feedback loops than do standard approaches.

    Keywords: Clinical prediction; Feedback loops; Machine learning; Model retraining; Post deployment performance monitoring; clinical informatics.

    Keywords:clinical prediction models; continuous evaluation

    目标: 随着临床实践中机器学习的采用继续增长,部署的分类器必须不断被监控和更新(重新训练),以防范由于不可避免的变化带来的数据漂移,这些变化包括医疗实践的发展和患者群体的变化。然而,成功的临床机器学习分类器会导致护理方式的变化,从而改变特征、标签及其关系的分布。例如,“高风险”病例可能最终因模型发出警报而采取干预措施并被重新标记为“低风险”。对这种部署引起的反馈循环不了解的分类器监控系统会低估模型性能,并导致未来的分类器重训练质量下降。本研究的目标是模拟这些反馈循环的影响,提出具有反馈意识的监测策略作为解决方案,并通过模拟评估这些替代监测策略的性能。

    方法: 我们提出了基于依从性和采样加权监控两种反馈回路感知的监督策略。通过模拟我们评估了它们在部署后准确评价模型性能以及启动安全和精确分类器重新训练方面的能力。

    结果: 根据准确性、接收者操作特征曲线下面积(AUROC)、平均精度、布赖尔得分、预期校准误差、F1值、精准度、灵敏度和特异性等指标,在存在反馈循环的情况下,依从性和采样加权策略对地表现出了最高的真实分类器性能一致性。而标准方法的估算最为不准确。此外,在真正的数据漂移模拟中,基于标准未加权方法的重新训练导致AUROC得分为0.52(从0.72下降)。相比之下,基于依从性和采样加权策略进行的重训恢复了模型性能至0.67,这与根据现有和变化的数据从头开始新训练所获得的结果相当。

    结论: 与标准方法相比,基于依从性和采样加权的策略在根据不采取治疗情况下的潜在结果评估分类器性能时更为准确。这些策略在面对数据漂移和反馈循环测试中表现出比标准方法更强的性能恢复能力。

    关键词: 临床预测;反馈回路;机器学习;模型重训练;部署后性能监控;临床信息学。

    关键词:临床预测模型; 连续评估

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Journal of biomedical informatics. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Journal of biomedical informatics

    缩写:J BIOMED INFORM

    ISSN:1532-0464

    e-ISSN:1532-0480

    IF/分区:4.0/Q2

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Monitoring strategies for continuous evaluation of deployed clinical prediction models