首页 正文

Medical & biological engineering & computing. 2025 Apr 8. doi: 10.1007/s11517-025-03355-5 Q32.62024

A machine learning approach for type 2 diabetes diagnosis and prognosis using tailored heterogeneous feature subsets

基于个性化异构特征子集的2型糖尿病诊断和预后机器学习方法 翻译改进

J Ramón Navarro-Cerdán  1  2, Pedro Pons-Suñer  3, Laura Arnal  3, Joaquim Arlandis  4  5, Rafael Llobet  4  5, Juan-Carlos Perez-Cortes  4  5, Francisco Lara-Hernández  6, Celeste Moya-Valera  6, Maria Elena Quiroz-Rodriguez  6, Gemma Rojo-Martinez  7  8, Sergio Valdés  7  8, Eduard Montanya  7  9  10, Alfonso L Calle-Pascual  11  12, Josep Franch-Nadal  7  13, Elias Delgado  14  15, Luis Castaño  7  15  16, Ana-Bárbara García-García  6  7, Felipe Javier Chaves  6  7

作者单位 +展开

作者单位

  • 1 Universitat Politècnica de València, Camí de Vera, s/n, 46022, València, Spain. jonacer@upv.es.
  • 2 ITI, Universitat Politècnica de València, Camino de Vera s/n, 46022, València, Spain. jonacer@upv.es.
  • 3 ITI, Instituto Tecnológico de Informática, Camino de Vera s/n, 46022, València, Spain.
  • 4 Universitat Politècnica de València, Camí de Vera, s/n, 46022, València, Spain.
  • 5 ITI, Universitat Politècnica de València, Camino de Vera s/n, 46022, València, Spain.
  • 6 Genomic and Diabetes Unit, INCLIVA Biomedical Research Institute, 46010, València, Spain.
  • 7 CIBERDEM, ISCIII, Madrid, Spain.
  • 8 UGC Endocrinología y Nutrición, Hospital regional Universitario de Málaga, Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Málaga, Spain.
  • 9 Bellvitge Hospital-IDIBELL, Barcelona, Spain.
  • 10 Department of Clinical Sciences, Barcelona, Spain.
  • 11 Medical School, University Complutense, Madrid, Spain.
  • 12 Endocrinology and Nutrition Department, Hospital Clínico Universitario San Carlos, Madrid, Spain.
  • 13 EAP Raval Sud, Catalan Institute of Health, GEDAPS Network, Primary Care, Research Support Unit (IDIAP-Jordi Gol Foundation), Barcelona, Spain.
  • 14 Department of Endocrinology and Nutrition, Central University Hospital of Asturias, Health Research Institute of the Principality of Asturias, Oviedo, Spain.
  • 15 CIBERER, Madrid, Spain.
  • 16 Cruces University Hospital, Biocruces Bizkaia Health Research Institute, Endo-ERN, UPV/EHU, Barakaldo, Spain.
  • DOI: 10.1007/s11517-025-03355-5 PMID: 40198441

    摘要 中英对照阅读

    Type 2 diabetes (T2D) is becoming one of the leading health problems in Western societies, diminishing quality of life and consuming a significant share of healthcare resources. This study presents machine learning models for T2D diagnosis and prognosis, developed using heterogeneous data from a Spanish population dataset (Di@bet.es study). The models were trained exclusively on individuals classified as controls and undiagnosed diabetics, ensuring that the results are not influenced by treatment effects or behavioral changes due to disease awareness. Two data domains are considered: environmental (patient lifestyle questionnaires and measurements) and clinical (biochemical and anthropometric measurements). The preprocessing pipeline consists of four key steps: geospatial data extraction, feature engineering, missing data imputation, and quasi-constancy filtering. Two working scenarios (Environmental and Healthcare) are defined based on the features used, and applied to two targets (diagnosis and prognosis), resulting in four distinct models. The feature subsets that best predict the target have been identified based on permutation importance and sequential backward selection, reducing the number of features and, consequently, the cost of predictions. In the Environmental scenario, models achieved an AUROC of 0.86 for diagnosis and 0.82 for prognosis. The Healthcare scenario performed better, with an AUROC of 0.96 for diagnosis and 0.88 for prognosis. A partial dependence analysis of the most relevant features is also presented. An online demo page showcasing the Environmental and Healthcare T2D prognosis models is available upon request.

    Keywords: Diagnosis and prognosis risk estimation; Feature selection; Geospatial data augmentation; Heterogeneous missing data imputation; Quasi-constancy heuristic; Type 2 diabetes mellitus.

    Keywords:machine learning; type 2 diabetes; diagnosis; prognosis; heterogeneous features

    2型糖尿病(T2D)正在成为西方社会的主要健康问题之一,降低了生活质量,并消耗了大量的医疗资源。本研究提出了用于T2D诊断和预后的机器学习模型,这些模型使用了西班牙人口数据集(Di@bet.es 研究)中的异构数据开发而成。这些模型仅在被分类为对照组和未确诊糖尿病患者的人群上进行训练,以确保结果不受治疗效果或疾病意识导致的行为变化的影响。研究考虑了两个数据领域:环境(患者的问卷调查和测量)和临床(生化和人体测量)。预处理流程包括四个关键步骤:地理空间数据提取、特征工程、缺失数据插补以及准常数过滤。基于所使用的特征,定义了两种工作场景(环境和医疗保健),并应用于两个目标(诊断和预测),从而产生了四种不同的模型。根据置换重要性分析和序列向后选择方法确定了最能预测目标的特征子集,减少了特征数量,并因此降低了预测成本。在环境场景中,模型在诊断上的AUROC为0.86,在预后的AUROC为0.82。医疗保健场景的表现更好,在诊断上的AUROC为0.96,在预后的AUROC为0.88。还展示了最重要的特征的偏依赖性分析。一个在线演示页面展示了环境和医疗保健T2D预测模型,可以按需提供。

    关键词:诊断与风险估计;特征选择;地理空间数据增强;异构缺失数据插补;准常数启发式;2型糖尿病。

    关键词:机器学习; type 2糖尿病; 诊断; 预后; 异质性特征

    翻译效果不满意? 用Ai改进或 寻求AI助手帮助 ,对摘要进行重点提炼
    Copyright © Medical & biological engineering & computing. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Medical & biological engineering & computing

    缩写:MED BIOL ENG COMPUT

    ISSN:0140-0118

    e-ISSN:1741-0444

    IF/分区:2.6/Q3

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    A machine learning approach for type 2 diabetes diagnosis and prognosis using tailored heterogeneous feature subsets