Approximate Medians and other Quantiles in One Pass and with Limited Memory
Sridhar Rajagopalan, Gurmeet Singh Manku, Bruce G. Lindsay
Full Paper (PDF)

Slides (HTML)

Abstract
We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply without regard to the value distribution or the arrival distribution of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude.

We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter.

We present the algorithms, their theoretical analysis and simulation results.

References

References, where available, link to the DBLP on the World Wide Web.

[1]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34
[2]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276
[3]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996: 294-305
[4]
...
[5]
...
[6]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291
[7]
Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan: Time Bounds for Selection. JCSS 7(4): 448-461(1973)
[8]
...
[9]
...
[10]
...
[11]
...
[12]
...
[13]
...
[14]
...
[15]
J. Ian Munro, Michael S. Paterson: Selection and Sorting with Limited Storage. TCS 12: 315-323(1980)
[16]
Raj Jain, Imrich Chlamtac: The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations. CACM 28(10): 1076-1085(1985)
[17]
Rakesh Agrawal, Arun N. Swami: A One-Pass Space-Efficient Algorithm for Finding Quantiles. COMAD 1995: 0-
[18]
Khaled Alsabti, Sanjay Ranka, Vineet Singh: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. VLDB 1997: 346-355
[19]
...
BIBTEX

@inproceedings{DBLP:conf/sigmod/RajagopalanML98,
author = {Gurmeet Singh Manku and
Sridhar Rajagopalan and
Bruce G. Lindsay},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Approximate Medians and other Quantiles in One Pass and with
Limited Memory},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-955-5},
pages = {426-435},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).