Statistical analysis and data mining文章索引

Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies [0.03%] 贝叶斯后验区间校准以提高观察性研究的可解释性

Jami J Mulgrave,David Madigan,George Hripcsak Jami J Mulgrave

Observational healthcare data offer the potential to estimate causal effects of medical products on a large scale. However, the confidence intervals and p-values produced by observational studies only account for random error and fail to ac...

Statistical analysis and data mining. 2024 Dec;17(6):e11715. DOI:10.1002/sam.11715 2024

A treeless absolutely random forest with closed-form estimators of expected proximities [0.03%] 一种无树的完全随机森林及其期望接近度的显式估计式

Eugene Laska,Ziqiang Lin,Carole Siegel et al. Eugene Laska et al.

We introduce a simple variant of a Purely Random Forest, an Absolute Random Forest (ARF) for clustering. At every node splits of units are determined by a randomly chosen feature and a random threshold drawn from a uniform distribution whos...

Statistical analysis and data mining. 2024 Apr;17(2):e11678. DOI:10.1002/sam.11678 2024

Data-driven Stochastic Model for Quantifying the Interplay Between Amyloid-beta and Calcium Levels in Alzheimer's Disease [0.03%] 一种数据驱动的随机模型量化淀粉样蛋白和钙离子在阿尔茨海默病相互作用关系

Hina Shaheen,Roderick Melnik,Sundeep Singh;Alzheimer’s Disease Neuroimaging Initiative Hina Shaheen

The abnormal aggregation of extracellular amyloid-β(Aβ) in senile plaques resulting in calcium Ca+2 dyshomeostasis is one of the primary symptoms of Alzheimer's disease (AD). Significant research efforts have been devoted in the p...

Statistical analysis and data mining. 2024 Apr;17(2):e11679. DOI:10.1002/sam.11679 2024

A tree-based gene-environment interaction analysis with rare features [0.03%] 基于罕见特征的树型基因-环境交互分析方法

Mengque Liu,Qingzhao Zhang,Shuangge Ma Mengque Liu

Gene-environment (G-E) interaction analysis plays a critical role in understanding and modeling complex diseases. Compared to main-effect-only analysis, it is more seriously challenged by higher dimensionality, weaker signals, and the uniqu...

Statistical analysis and data mining. 2022 Oct;15(5):648-674. DOI:10.1002/sam.11578 2022

Integrative Learning of Structured High-Dimensional Data from Multiple Datasets [0.03%] 跨数据集结构高维数据分析的整合学习方法研究

Changgee Chang,Zongyu Dai,Jihwan Oh et al. Changgee Chang et al.

Integrative learning of multiple datasets has the potential to mitigate the challenge of small n and large p that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be ...

Statistical analysis and data mining. 2023 Apr;16(2):120-134. DOI:10.1002/sam.11601 2023

A Tutorial on Generative Adversarial Networks with Application to Classification of Imbalanced Data [0.03%] 用于不平衡数据分类的生成对抗网络教程

Yuxiao Huang,Kara G Fields,Yan Ma Yuxiao Huang

A challenge unique to classification model development is imbalanced data. In a binary classification problem, class imbalance occurs when one class, the minority group, contains significantly fewer samples than the other class, the majorit...

Statistical analysis and data mining. 2022 Oct;15(5):543-552. DOI:10.1002/sam.11570 2022

Regression-Based Bayesian Estimation and Structure Learning for Nonparanormal Graphical Models [0.03%] 回归图模型结构学习的贝叶斯方法研究

Jami J Mulgrave,Subhashis Ghosal Jami J Mulgrave

A nonparanormal graphical model is a semiparametric generalization of a Gaussian graphical model for continuous variables in which it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone tr...

Statistical analysis and data mining. 2022 Oct;15(5):611-629. DOI:10.1002/sam.11576 2022

A General Iterative Clustering Algorithm [0.03%] 一般的迭代聚类算法

Ziqiang Lin,Eugene Laska,Carole Siegel Ziqiang Lin

The quality of a cluster analysis of unlabeled units depends on the quality of the between units dissimilarity measures. Data dependent dissimilarity is more objective than data independent geometric measures such as Euclidean distance. As ...

Statistical analysis and data mining. 2022 Aug;15(4):433-446. DOI:10.1002/sam.11573 2022

Multi-scale affinities with missing data: Estimation and applications [0.03%] 多尺度下的缺失数据亲和力估计及应用

Min Zhang,Gal Mishne,Eric C Chi Min Zhang

Many machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights ha...

Statistical analysis and data mining. 2022 Jun;15(3):303-313. DOI:10.1002/sam.11561 2022

Bag of little bootstraps for massive and distributed longitudinal data [0.03%] 大规模分布式纵向数据的“小自助.bootstrap袋"方法

Xinkai Zhou,Jin J Zhou,Hua Zhou Xinkai Zhou

Linear mixed models are widely used for analyzing longitudinal datasets, and the inference for variance component parameters relies on the bootstrap method. However, health systems and technology companies routinely generate massive longitu...

Statistical analysis and data mining. 2022 Jun;15(3):314-321. DOI:10.1002/sam.11563 2022