2002 Digital Symposium Collection

Optimal and Approximate Computation of Summary Statistics for Range Aggregates

Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss
View Paper (PDF)

Return to Aggregates

Abstract

Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and on-line query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer ag-gregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-knownwavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.