首页 正文

Journal of biomedical informatics. 2025 Apr 30:104839. doi: 10.1016/j.jbi.2025.104839 Q24.02024

Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies

利用图表审查的表型中的未决病例增强基于EHR的关联研究 翻译改进

Xinyao Jian  1, Dazheng Zhang  1, Zehao Yu  2, Hua Xu  3, Jiang Bian  2, Yonghui Wu  4, Jiayi Tong  5, Yong Chen  6

作者单位 +展开

作者单位

  • 1 The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA.
  • 2 Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
  • 3 Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, USA.
  • 4 Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
  • 5 The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
  • 6 The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA; The Graduate Group in Applied Mathematics and Computational Science, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA; Leonard Davis Institute of Health Economics, Philadelphia, PA, USA; Penn Medicine Center for Evidence-based Practice (CEP), Philadelphia, PA, USA; Penn Institute for Biomedical Informatics (IBI), Philadelphia, PA, USA. Electronic address: ychen123@upenn.edu.
  • DOI: 10.1016/j.jbi.2025.104839 PMID: 40316004

    摘要 中英对照阅读

    Objectives: In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third "undecided" category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.

    Material and methods: We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer's disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.

    Results: Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotype, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.

    Discussion: Our simulation studies and real-world application demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.

    Conclusion: The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.

    Keywords: Association study; Augmented estimation; Electronic health records; Manual chart review; Outcome-dependent sampling; Phenotyping error.

    Keywords:undecided cases; chart-reviewed phenotypes; electronic health records; association studies

    目标: 在基于电子健康记录(EHR)的关联研究中,表型算法能够将患者的临床结果有效地分类为二元类别,但易受错误分类的影响。黄金标准是手动病历审查,涉及临床医生根据其对医疗记录的评估确定真实疾病状态。这些由临床医生标记的表型需要大量人力,并且通常仅限于一小部分患者,可能会引入第三种“未决定”类别当表型不确定时。我们的目标是在EHR基于关联研究中有效整合算法产生的结果和手动审查的结果。

    材料与方法: 我们提出了一种增强估计方法,结合整个队列的二元算法生成的表型以及一个小的选择子集中的三元病历审查表型。此外,在罕见疾病的情况下使用成本效益高的结果依赖性抽样策略。所提出的集成三元病历审查表型的成本效益增强估计(TriCA)在广泛的模拟设置和现实世界应用中进行了评估,包括使用OneFlorida +临床研究网络的EHR数据进行阿尔茨海默病及相关痴呆症(ADRD)的数据分析,以及使用Kaiser Permanente Washington的队列数据进行第二次乳腺癌事件(SBCE)的数据分析。

    结果: 与基于随机抽样的估计相比,在模拟研究中我们的增强方法将均方误差提高了最多28.3%;与仅使用三元病历审查表型的估计相比,我们的方法在ADRD数据中的效率提高了最多33.3%,在SBCE数据中提高了50.8%。

    讨论: 我们的模拟研究和现实世界应用表明,与现有方法相比,所提出的方法提供了无偏估计且统计效率更高。

    结论: 所提出的方法有效地结合了整个队列的二元算法生成表型以及有限验证集中的三元病历审查结果,在EHR基于关联研究中广泛的应用范围提高了风险因素识别能力。

    关键词: 关联研究;增强估计;电子健康记录;手动病历审查;结果依赖性抽样;表型误差。

    关键词:未决病例; 图表审查表型; 电子健康记录; 关联研究

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Journal of biomedical informatics. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Journal of biomedical informatics

    缩写:J BIOMED INFORM

    ISSN:1532-0464

    e-ISSN:1532-0480

    IF/分区:4.0/Q2

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies