首页 正文

Nutrition & diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2 Q14.62024

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

基于无监督机器学习方法的2型糖尿病亚群的识别及流行病学特征分析 翻译改进

Saptarshi Bej  1  2, Jit Sarkar  3  4, Saikat Biswas  5, Pabitra Mitra  6, Partha Chakrabarti  7  8, Olaf Wolkenhauer  9  10  11

作者单位 +展开

作者单位

  • 1 Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany. saptarshibej24@gmail.com.
  • 2 Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany. saptarshibej24@gmail.com.
  • 3 Division of Cell Biology and Physiology, CSIR-Indian Institute of Chemical Biology, Kolkata, India. jitnpur@gmail.com.
  • 4 Academy of Innovative and Scientific Research, Ghaziabad, India. jitnpur@gmail.com.
  • 5 Advanced Technology Development Centre, Indian Institute of Technology, Kharagpur, India.
  • 6 Department of Computer Science & Engineering, Indian Institute of Technology, Kharagpur, India.
  • 7 Division of Cell Biology and Physiology, CSIR-Indian Institute of Chemical Biology, Kolkata, India.
  • 8 Academy of Innovative and Scientific Research, Ghaziabad, India.
  • 9 Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany. olaf.wolkenhauer@uni-rostock.de.
  • 10 Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany. olaf.wolkenhauer@uni-rostock.de.
  • 11 Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, Stellenbosch, South Africa. olaf.wolkenhauer@uni-rostock.de.
  • DOI: 10.1038/s41387-022-00206-2 PMID: 35624098

    摘要 Ai翻译

    Background: Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients.

    Methods: Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data.

    Results: Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods.

    Conclusions: From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.

    Keywords:type 2 diabetes; unsupervised machine learning

    Copyright © Nutrition & diabetes. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Nutrition & diabetes

    缩写:NUTR DIABETES

    ISSN:2044-4052

    e-ISSN:

    IF/分区:4.6/Q1

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach