Journal of biomedical informatics. 2025 Jun 5:104854. doi: 10.1016/j.jbi.2025.104854 Q24.02024

Monitoring strategies for continuous evaluation of deployed clinical prediction models

临床预测模型连续评估的监测策略翻译改进

Grace Y E Kim¹, Conor K Corbin², François Grolleau³, Michael Baiocchi⁴, Jonathan H Chen⁵

作者单位 +展开

作者单位

¹ Center for Biomedical Informatics Research, Stanford, California, USA. Electronic address: yek1354@stanford.edu.

² Center for Biomedical Informatics Research, Stanford, California, USA; Department of Biomedical Data Science, Stanford, California, USA.

³ Center for Biomedical Informatics Research, Stanford, California, USA.

⁴ Department of Epidemiology and Population Health, Stanford, California, USA.

⁵ Center for Biomedical Informatics Research, Stanford, California, USA; Division of Hospital Medicine, Stanford, Stanford, California, USA; Clinical Excellence Research Center, Stanford, California, USA.

DOI: 10.1016/j.jbi.2025.104854 PMID: 40482691

摘要中英对照阅读

Objective: As machine learning adoption in clinical practice continues to grow, deployed classifiers must be continuously monitored and updated (retrained) to protect against data drift that stems from inevitable changes, including evolving medical practices and shifting patient populations. However, successful clinical machine learning classifiers will lead to a change in care which may change the distribution of features, labels, and their relationship. For example, "high risk" cases that were correctly identified by the model may ultimately get labeled as "low risk" thanks to an intervention prompted by the model's alert. Classifier surveillance systems naive to such deployment-induced feedback loops will estimate lower model performance and lead to degraded future classifier retrains. The objective of this study is to simulate the impact of these feedback loops, propose feedback aware monitoring strategies as a solution, and assess the performance of these alternative monitoring strategies through simulations.

Methods: We propose Adherence Weighted and Sampling Weighted Monitoring as two feedback loop-aware surveillance strategies. Through simulation we evaluate their ability to accurately appraise post deployment model performance and to initiate safe and accurate classifier retraining.

Results: Measured across accuracy, area under the receiver operating characteristic curve, average precision, brier score, expected calibration error, F1, precision, sensitivity, and specificity, in the presence of feedback loops, Adherence Weighted and Sampling Weighted strategies have the highest fidelity to the ground truth classifier performance while standard approaches yield the most inaccurate estimations. Furthermore, in simulations with true data drift, retraining using standard unweighted approaches results in a AUROC score of 0.52 (drop from 0.72). In contrast, retraining based on Adherence Weighted and Sampling Weighted strategies recover performance to 0.67 which is comparable to what a new model trained from scratch on the existing and shifted data would obtain.

Conclusion: Compared to standard approaches, Adherence Weighted and Sampling Weighted strategies yield more accurate classifier performance estimates, measured according to the no-treatment potential outcome. Retraining based on these strategies bring stronger performance recovery when tested against data drift and feedback loops than do standard approaches.

Keywords: Clinical prediction; Feedback loops; Machine learning; Model retraining; Post deployment performance monitoring; clinical informatics.

Keywords：clinical prediction models; continuous evaluation

目标： 随着临床实践中机器学习的采用继续增长，部署的分类器必须不断被监控和更新（重新训练），以防范由于不可避免的变化带来的数据漂移，这些变化包括医疗实践的发展和患者群体的变化。然而，成功的临床机器学习分类器会导致护理方式的变化，从而改变特征、标签及其关系的分布。例如，“高风险”病例可能最终因模型发出警报而采取干预措施并被重新标记为“低风险”。对这种部署引起的反馈循环不了解的分类器监控系统会低估模型性能，并导致未来的分类器重训练质量下降。本研究的目标是模拟这些反馈循环的影响，提出具有反馈意识的监测策略作为解决方案，并通过模拟评估这些替代监测策略的性能。

方法： 我们提出了基于依从性和采样加权监控两种反馈回路感知的监督策略。通过模拟我们评估了它们在部署后准确评价模型性能以及启动安全和精确分类器重新训练方面的能力。

结果： 根据准确性、接收者操作特征曲线下面积（AUROC）、平均精度、布赖尔得分、预期校准误差、F1值、精准度、灵敏度和特异性等指标，在存在反馈循环的情况下，依从性和采样加权策略对地表现出了最高的真实分类器性能一致性。而标准方法的估算最为不准确。此外，在真正的数据漂移模拟中，基于标准未加权方法的重新训练导致AUROC得分为0.52（从0.72下降）。相比之下，基于依从性和采样加权策略进行的重训恢复了模型性能至0.67，这与根据现有和变化的数据从头开始新训练所获得的结果相当。

结论： 与标准方法相比，基于依从性和采样加权的策略在根据不采取治疗情况下的潜在结果评估分类器性能时更为准确。这些策略在面对数据漂移和反馈循环测试中表现出比标准方法更强的性能恢复能力。

关键词： 临床预测；反馈回路；机器学习；模型重训练；部署后性能监控；临床信息学。

关键词：临床预测模型; 连续评估

翻译效果不满意？用Ai改进或寻求AI助手帮助，对摘要进行重点提炼

相关内容

期刊名：Journal of biomedical informatics

缩写：J BIOMED INFORM

ISSN：1532-0464

e-ISSN：1532-0480

IF/分区：4.0/Q2

文章目录更多期刊信息

全文链接

官方链接

PMC全文

引文链接

复制

已复制！

格式：

Monitoring strategies for continuous evaluation of deployed clinical prediction models

临床预测模型连续评估的监测策略翻译改进

A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis

基于个体患者数据的meta分析中临床预测模型的开发、实施和评价框架

Clinical Prediction Models and Predictors for Death or Adverse Neurodevelopmental Outcome in Term Newborns with Hypoxic-Ischemic Encephalopathy: A Systematic Review of the Literature

新生儿缺氧缺血性脑病死亡或不良神经发育结局的临床预测模型及预测因子：文献系统回顾

Joint forces for making clinical prediction models contribute to science

合力推动临床预测模型研究促进科学进步

Overview of clinical prediction models

临床预测模型综述

Clinical prediction models

临床预测模型

Clinical prediction models

临床预测模型

Evidence of questionable research practices in clinical prediction models

临床预测模型中存在研究不当行为的证据

Systematic Review of Clinical Prediction Models for the Risk of Emergency Caesarean Births

关于急诊剖宫产风险的临床预测模型系统评价研究

Clinical Prediction Models for Suspected Pediatric Foreign Body Aspiration: A Systematic Review and Meta-analysis

儿童误吸异物临床预测模型的系统回顾和meta分析

Monitoring strategies for continuous evaluation of deployed clinical prediction models

临床预测模型连续评估的监测策略 翻译改进

A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis

基于个体患者数据的meta分析中临床预测模型的开发、实施和评价框架

Clinical Prediction Models and Predictors for Death or Adverse Neurodevelopmental Outcome in Term Newborns with Hypoxic-Ischemic Encephalopathy: A Systematic Review of the Literature

新生儿缺氧缺血性脑病死亡或不良神经发育结局的临床预测模型及预测因子：文献系统回顾

Joint forces for making clinical prediction models contribute to science

合力推动临床预测模型研究促进科学进步

Overview of clinical prediction models

临床预测模型综述

Clinical prediction models

临床预测模型

Clinical prediction models

临床预测模型

Evidence of questionable research practices in clinical prediction models

临床预测模型中存在研究不当行为的证据

Systematic Review of Clinical Prediction Models for the Risk of Emergency Caesarean Births

关于急诊剖宫产风险的临床预测模型系统评价研究

Clinical Prediction Models for Suspected Pediatric Foreign Body Aspiration: A Systematic Review and Meta-analysis

儿童误吸异物临床预测模型的系统回顾和meta分析

临床预测模型连续评估的监测策略翻译改进