首页 正文

Methods of information in medicine. 2025 Apr 17. doi: 10.1055/a-2590-6456 Q31.82025

Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms

自动从不结构化的血液病理报告中提取信息以支持骨髓增生性肿瘤的应答评估 翻译改进

Spencer Krichevsky, Evan T Sholle  1, Prakash Adekkanattu  2, Sajjad Abedian  1, Madhu Ouseph  3, Elwood Taylor  4, Ghaith Abu-Zeinah  4, Diana Jaber  4, Claudia Sosner  4, Marika M Cusick  1, Niamh Savage  4, Richard T Silver  4, Joseph M Scandura  4, Thomas Campion  5

作者单位 +展开

作者单位

  • 1 Information Technologies & Services, Weill Cornell Medicine, New York, United States.
  • 2 Health Policy Research, Weill Cornell Medical College, New York, United States.
  • 3 Department of Pathology, Weill Cornell Medicine, New York, United States.
  • 4 Richard T. Silver Myeloproliferative Neoplasms Center, Division of Hematology and Medical Oncology, Weill Cornell Medicine, New York, United States.
  • 5 Pediatrics, Public Health, Cornell Weill University, New York, United States.
  • DOI: 10.1055/a-2590-6456 PMID: 40245940

    摘要 中英对照阅读

    Background: Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.

    Methods: An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.

    Results: Compared to a reference standard (n=300 reports), our NLP method extracted features such as aspirate myeloblasts (F1:0.98) and biopsy reticulin fibrosis (F1:0.93) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1:0.06) and via flow cytometry (F1:0.08), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 0.90. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.

    Conclusions: To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for the purpose of clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.

    Keywords:information extraction; unstructured reports; hematopathology; myeloproliferative neoplasms

    背景: 评估骨髓病理(血液病理)报告中再生障碍性肿瘤患者的治疗反应较为困难,因为数据成分存在于非结构化文档中,需要专门的手动注释和解释。虽然自然语言处理(NLP)已被成功应用于从实体瘤报告中提取特征,但对于其在血液病理学中的应用却知之甚少。

    方法: 实施了一个名为Leo的开源NLP框架来解析文档段落并从中提取用于评估再生障碍性肿瘤反应的概念短语。通过手动审查血液病理记录生成了参考标准。

    结果: 与参考标准(n=300份报告)相比,我们的NLP方法以高精度提取了诸如抽吸髓样原粒细胞(F1:0.98)和活检网状纤维化(F1:0.93)等特征。然而,其他值如通过活检获得的髓样原粒细胞(F1:0.06)以及流式细胞术测量的结果(F1:0.08),受报告惯例所导致的数据稀疏性的影响较大。具有最高临床重要性的四个特征以超过0.90的F1分数被提取出来。手动注释300份报告需要工作人员花费30小时,而自动化NLP处理34,301份报告仅需运行时间3.5小时。

    结论: 据我们所知,这是首次展示将自然语言处理应用于血液病理学以提取临床特征的研究之一。此方法可能为其他机构的相关努力提供指导,并且代码可在https://github.com/wcmc-research-informatics/BmrExtractor获取。

    关键词:信息提取; 非结构化报告; 血液病理学; 髓系增生性肿瘤

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Methods of information in medicine. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Methods of information in medicine

    缩写:METHOD INFORM MED

    ISSN:0026-1270

    e-ISSN:2511-705X

    IF/分区:1.8/Q3

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms