Welcome to D
SIGMOD'00
 = SIGMOD'00 We
 = Plenary Talk
<<< = SIGMOD'00 Pa>>>
PODS'00
SIGMOD Recor
CIKM 2000/CI
COMAD 2000
Data Enginee
DL 2000
DPDJ
EDBT 2000
Hypertext 20
ICDE 2000
KDD 2000
KDD Explorat
KRDB 2000
SBBD 2000
SIGIR 2000
SIGIR Forum
SSDBM 2000
TODS
VLDB'00
VLDBJ

Congressional Samples for Approximate Answering of Group-By Queries


Swarup Acharya, Phillip B. Gibbons, and Viswanath Poosala



Return to Research Sessions


Abstract

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each group (group-by queries). Depending on the data, there can be a wide disparity between the number of data items in each group. As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very few (often zero) tuples.

In this paper, we propose a general class of techniques for obtaining fast, highly-accurate answers for group-by queries. These techniques rely on precomputed non-uniform (biased) samples of the data. In particular, we propose congressional samples, a hybrid union of uniform and biased samples.- Given a fixed amount of space, congressional samples seek to maximize the accuracy for all possible group-by queries on a set of columns. We present a one pass algorithm for constructing a congressional sample and use this technique to also incrementally maintain the sample up-to-date without accessing the base relation. We also evaluate query rewriting strategies for providing approximate answers from congressional samples. Finally, we conduct an extensive set of experiments on the TPC-D database, which demonstrates the efficacy of the techniques proposed.


References


Note: References link to DBLP on the Web.

[AGP99a]
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala : Aqua: A Fast Decision Support Systems Using Approximate Query Answers. VLDB 1999 : 754-757
[AGP99b]
...
[AGPR99]
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy : Join Synopses for Approximate Query Answering. SIGMOD Conference 1999 : 275-286
[CD97]
Surajit Chaudhuri , Umeshwar Dayal : An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1) : 65-74(1997)
[CMN99]
Surajit Chaudhuri , Rajeev Motwani , Vivek R. Narasayya : On Random Sampling over Joins. SIGMOD Conference 1999 : 263-274
[Coc77]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
[CS94]
Surajit Chaudhuri , Kyuseok Shim : Including Group-By in Query Optimization. VLDB 1994 : 354-366
[CS95]
Surajit Chaudhuri , Kyuseok Shim : An Overview of Cost-based Optimization of Queries with Aggregates. Data Engineering Bulletin 18(3) : 3-9(1995)
[GM98]
Phillip B. Gibbons , Yossi Matias : New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998 : 331-342
[HH99]
Peter J. Haas , Joseph M. Hellerstein : Ripple Joins for Online Aggregation. SIGMOD Conference 1999 : 287-298
[HHW97]
Joseph M. Hellerstein , Peter J. Haas , Helen Wang : Online Aggregation. SIGMOD Conference 1997 : 171-182
[IP99]
Yannis E. Ioannidis , Viswanath Poosala : Histogram-Based Approximation of Set-Valued Query-Answers. VLDB 1999 : 174-185
[Kim96]
Ralph Kimball : The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley 1996, ISBN 0-471-15337-0
[Olk93]
Frank Olken : Random Sampling from Databases. Ph.D. thesis, University of California at Berkeley LBL Technical Report 1993
[PIHS96]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[SAC+79]
Patricia G. Selinger , Morton M. Astrahan , Donald D. Chamberlin , Raymond A. Lorie , Thomas G. Price : Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979 : 23-34
[Sch97]
...
[TPC99]
Transaction Processing Performance Council. http://www.tpc.org
[VW99]
Jeffrey Scott Vitter , Min Wang : Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999 : 193-204

BIBTEX


@inproceedings{DBLP:conf/sigmod/AcharyaGP00,
  author    = {Swarup Acharya and
                Phillip B. Gibbons and
                Viswanath Poosala},
   editor    = {Weidong Chen and
                Jeffrey F. Naughton and
                Philip A. Bernstein},
   title     = {Congressional Samples for Approximate Answering of Group-By Queries},
   booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
                Management of Data, May 16-18, 2000, Dallas, Texas, USA},
   journal   = {SIGMOD Record},
   publisher = {ACM},
   volume    = {29},
   number    = {2},
   year      = {2000},
   isbn      = {1-58113-218-2},
   pages     = {487-498},
   crossref  = {DBLP:conf/sigmod/2000},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },




DiSC'01 Copyright ©2002 ACM Inc.