Sparse online learning and cost-sensitive learning are two important areas of machine learning and data mining research. Each has been well studied with many interesting algorithms developed. However, very limited published work addresses the joint study of these two fields. In this paper, to tackle the high-dimensional data streams with skewed distributions, we introduce a framework of cost-sensitive sparse online learning. Our proposed framework is a substantial extension of the influential Truncated Gradient (TG) method by formulating a new convex optimization problem, where the two mutual restraint factors, misclassification cost and sparsity, can be simultaneously and favorably balanced. We theoretically analyze the regret and cost bounds of the proposed algorithm, and pinpoint its theoretical merit compared to the existing related approaches. Large-scale empirical comparisons to five baseline methods on eight real-world streaming datasets demonstrate the encouraging performance of the developed method. Algorithm implementation and datasets are available upon request.
Keywords: Cost-sensitive learning; Data streams; Online learning; Optimization; Sparse learning; Truncated gradient.