Willi Mann,Nikolaus Augsten,Christian S Jensen et al.
Willi Mann et al.
We provide efficient support for applications that aim to continuously find pairs of similar sets in rapid streams, such as Twitter streams that emit tweets as sets of words. Using a sliding window model, the top-k result changes as new set...
HPCache: memory-efficient OLAP through proportional caching revisited [0.03%]
HPCache:通过比例重复访问节省内存的联机分析处理算法研究
Hamish Nicholson,Periklis Chrysogelos,Anastasia Ailamaki
Hamish Nicholson
Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the...
Ahmet Kara,Milos Nikolic,Dan Olteanu et al.
Ahmet Kara et al.
This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression mo...
(p,q)-biclique counting and enumeration for large sparse bipartite graphs [0.03%]
大规模稀疏二分图中(p,q)-二部团的计数与枚举
Jianye Yang,Yun Peng,Dian Ouyang et al.
Jianye Yang et al.
In this paper, we study the problem of (p, q)-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph G=(U,V,E) and two integer parameters p and q, we aim to efficiently count and enumerate all (p, q)-bi...
Wenfei Fan,Yuanhao Li,Muyang Liu et al.
Wenfei Fan et al.
This paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts and regular structures into supernodes. The supernodes carry a synopsis S Q for each query class Q in use, to abstract key features of the contr...
Magdalena Balazinska,Xiaofang Zhou
Magdalena Balazinska
Egawati Panjei,Le Gruenwald,Eleazar Leal et al.
Egawati Panjei et al.
While many techniques for outlier detection have been proposed in the literature, the interpretation of detected outliers is often left to users. As a result, it is difficult for users to promptly take appropriate actions concerning the det...
Alexander Ratner,Stephen H Bach,Henry Ehrenberg et al.
Alexander Ratner et al.
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data....
Matteo Interlandi,Ari Ekmekji,Kshitij Shah et al.
Matteo Interlandi et al.
Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countles...
Chen Zeng,Jeffrey F Naughton,Jin-Yi Cai
Chen Zeng
We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very diff...