首页 正文

Biology open. 2022 Apr 15;11(4):bio059001. doi: 10.1242/bio.059001 0

Dirichlet process mixture models for single-cell RNA-seq clustering

基于狄利克雷过程混合模型的单细胞RNA序列聚类方法 翻译改进

Nigatu A Adossa  1, Kalle T Rytkönen  1  2, Laura L Elo  1  3

作者单位 +展开

作者单位

  • 1 Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.
  • 2 Institute of Biomedicine, Research Centre for Integrative Physiology and Pharmacology, University of Turku, FI-20014, Finland.
  • 3 Institute of Biomedicine, University of Turku, FI-20014, Finland.
  • DOI: 10.1242/bio.059001 PMID: 35237784

    摘要 Ai翻译

    Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.

    Keywords: Clustering; Hierarchical Dirichlet process (HDP); Latent Dirichlet allocation (LDA); ScRNA-seq.

    Keywords:single-cell rna-seq; clustering

    Copyright © Biology open. 中文内容为AI机器翻译,仅供参考!

    相关内容

    期刊名:Biology Open

    缩写:BIOL OPEN

    ISSN:2046-6390

    e-ISSN:2046-6390

    IF/分区:/

    文章目录 更多期刊信息

    全文链接
    引文链接
    复制
    已复制!
    推荐内容
    Dirichlet process mixture models for single-cell RNA-seq clustering