Jonathon J OBrien,Michael T Lawson,Devin K Schweppe et al.
Jonathon J OBrien et al.
The distinction between classification and clustering is often based on a priori knowledge of classification labels. However, in the purely theoretical situation where a data-generating model is known, the optimal solutions for clustering d...
Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models [0.03%]
广义线性模型混合中的局部和整体偏差R平方度量方法
Roberto Di Mari,Salvatore Ingrassia,Antonio Punzo
Roberto Di Mari
In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtur...
Minji Kim,Hee-Seok Oh,Yaeji Lim
Minji Kim
This study develops a new clustering method for high-dimensional zero-inflated time series data. The proposed method is based on thick-pen transform (TPT), in which the basic idea is to draw along the data with a pen of a given thickness. S...
DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling [0.03%]
基于迭代特征缩放的DDCAL算法:方差均匀的数据分配方法
Marian Lux,Stefanie Rinderle-Ma
Marian Lux
This work studies the problem of clustering one-dimensional data points such that they are evenly distributed over a given number of low variance clusters. One application is the visualization of data on choropleth maps or on business proce...
Similarity-Reduced Diversities: the Effective Entropy and the Reduced Entropy [0.03%]
相似减少多样性:有效熵与减少熵
François Bavaud
François Bavaud
The paper presents and analyzes the properties of a new diversity index, the effective entropy, which lowers Shannon entropy by taking into account the presence of similarities between items. Similarities decrease exponentially with the ite...
Paul D McNicholas
Paul D McNicholas
Alessandro Casa,Charles Bouveyron,Elena Erosheva et al.
Alessandro Casa et al.
Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data, we need to account for relations among both time instan...
ROC and AUC with a Binary Predictor: a Potentially Misleading Metric [0.03%]
二元预测变量的 ROC 和 AUC:一个可能具有误导性的指标
John Muschelli
John Muschelli
In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summ...
Richard D Payne,Bani K Mallick
Richard D Payne
This paper discusses the challenges presented by tall data problems associated with Bayesian classification (specifically binary classification) and the existing methods to handle them. Current methods include parallelizing the likelihood, ...
Katie Evans,Tanzy Love,Sally W Thurston
Katie Evans
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a ...