External Memory Algorithms
Jeffrey Scott Vitter
Full Paper (PDF)

Slides (HTML)

Abstract
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between the fast internal memory and the slower external memory (such as disks) can be a major performance bottleneck. In this tutorial we survey the state of the art in the design and analysis of external memory algorithms (also known as out-of-core algorithms or I/O algorithms). External memory algorithms are often designed using the parallel disk model (PDM). Its three primary machine-independent measures of an algorithm's performance are the number of I/O operations performed, the CPU time, and the amount of disk space used. The PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle cache hierarchies, hierarchical memory, and tertiary storage.

We discuss a variety of problems and identify a set of paradigms for solving them efficiently in external memory. Programming tools and environments are available for simplifying the programming task. We present experiments involving some newly developed algorithms for spatial databases, implemented using the TPIE programming environment. The empirical timings show a significant speedup attained by the new algorithms over currently used methods.

References

References, where available, link to the DBLP on the World Wide Web.

[1]
...
[2]
Pankaj K. Agarwal, Lars Arge, Jeff Erickson, Paolo Giulio Franciosa, Jeffrey Scott Vitter: Efficient Searching with Linear Constraints. PODS 1998: 169-178
[3]
...
[4]
...
[5]
...
[6]
Alok Aggarwal, Jeffrey Scott Vitter: The Input/Output Complexity of Sorting and Related Problems. CACM 31(9): 1116-1127(1988)
[7]
Lars Arge: The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract). WADS 1995: 334-345
[8]
...
[9]
...
[10]
...
[11]
...
[12]
Lars Arge, Mikael Knudsen, Kirsten Larsen: A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms. WADS 1993: 83-94
[13]
...
[14]
...
[15]
...
[16]
...
[17]
...
[18]
...
[19]
...
[20]
Rakesh D. Barve, Edward F. Grove, Jeffrey Scott Vitter: Simple Randomized Mergesort on Parallel Disks. Parallel Computing 23(4-5): 601-631(1997)
[21]
Rudolf Bayer, Edward M. McCreight: Organization and Maintenance of Large Ordered Indices. Acta Informatica 1: 173-189(1972)
[22]
Bruno Becker, Stephan Gschwind, Thomas Ohler, Bernhard Seeger, Peter Widmayer: An Asymptotically Optimal Multiversion B-Tree. VLDB Journal 5(4): 264-275(1996)
[23]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331
[24]
Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations. EDBT 1998: 216-230
[25]
...
[26]
Paul B. Callahan, Michael T. Goodrich, Kumar Ramaiyer: Topology B-Trees and Their Applications. WADS 1995: 381-392
[27]
Peter M. Chen, Edward L. Lee, Garth A. Gibson, Randy H. Katz, David A. Patterson: RAID: High-Performance, Reliable Secondary Storage. Computing Surveys 26(2): 145-185(1994)
[28]
...
[29]
Yi-Jen Chiang: Experiments on the Practical I/O Efficiency of Geometric Algorithms: Distribution Sweep vs. Plane Sweep. WADS 1995: 346-357
[30]
...
[31]
...
[32]
...
[33]
...
[34]
...
[35]
Robert Cypher, C. Greg Plaxton: Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers. JCSS 47(3): 501-548(1993)
[36]
...
[37]
...
[38]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291
[39]
James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre Tarjan: Making Data Structures Persistent. JCSS 38(1): 86-124(1989)
[40]
...
[41]
...
[42]
Paolo Ferragina, Roberto Grossi: A Fully-Dynamic Data Structure for External Substring Search (Extended Abstract). STOC 1995: 693-702
[43]
...
[44]
...
[45]
Garth A. Gibson, Jeffrey Scott Vitter, John Wilkes: Strategic Directions in Storage I/O Issues in Large-Scale Computing. Computing Surveys 28(4): 779-793(1996)
[46]
Michael T. Goodrich, Jyh-Jong Tsay, Darren Erik Vengroff, Jeffrey Scott Vitter: External-Memory Computational Geometry (Preliminary Version). FOCS 1993: 714-723
[47]
Diane Greene: An Implementation and Performance Analysis of Spatial Data Access Methods. ICDE 1989: 606-615
[48]
Roberto Grossi, Giuseppe F. Italiano: Efficient Splitting and Merging Algorithms for Order Decomposable Problems (Extended Abstract). ICALP 1997: 605-615
[49]
...
[50]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57
[51]
Joseph M. Hellerstein, Elias Koutsoupias, Christos H. Papadimitriou: On the Analysis of Indexing Schemes. PODS 1997: 249-256
[52]
Lisa Hellerstein, Garth A. Gibson, Richard M. Karp, Randy H. Katz, David A. Patterson: Coding Techniques for Handling Failures in Large Disk Arrays. Algorithmica 12(2/3): 182-208(1994)
[53]
Jia-Wei Hong, H. T. Kung: I/O Complexity: The Red-Blue Pebble Game. STOC 1981: 326-333
[54]
...
[55]
Ibrahim Kamel, Christos Faloutsos: On Packing R-trees. CIKM 1993: 490-499
[56]
Ibrahim Kamel, Christos Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. VLDB 1994: 500-509
[57]
...
[58]
Paris C. Kanellakis, Gabriel M. Kuper, Peter Z. Revesz: Constraint Query Languages. PODS 1990: 299-313
[59]
Paris C. Kanellakis, Sridhar Ramaswamy, Darren Erik Vengroff, Jeffrey Scott Vitter: Indexing for Data Models with Constraints and Classes. PODS 1993: 233-243
[60]
David G. Kirkpatrick, Raimund Seidel: The Ultimate Planar Convex Hull Algorithm? SIAM J. Comput. 15(1): 287-299(1986)
[61]
...
[62]
...
[63]
...
[64]
...
[65]
Charles E. Leiserson, Satish Rao, Sivan Toledo: Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers (Extended Abstract). FOCS 1993: 704-713
[66]
...
[67]
...
[68]
David B. Lomet, Betty Salzberg: Concurrency and Recovery for Index Trees. VLDB Journal 6(3): 224-240(1997)
[69]
...
[70]
Mark H. Nodine, Michael T. Goodrich, Jeffrey Scott Vitter: Blocking for External Graph Searching. Algorithmica 16(2): 181-214(1996)
[71]
Mark H. Nodine, Daniel P. Lopresti, Jeffrey Scott Vitter: I/O Overhead and Parallel VLSI Architectures for Lattice Computations. IEEE Transactions on Computers 40(7): 843-852(1991)
[72]
...
[73]
Mark H. Nodine, Jeffrey Scott Vitter: Greed Sort: Optimal Deterministic Sorting on Parallel Disks. JACM 42(4): 919-933(1995)
[74]
HweeHwa Pang, Michael J. Carey, Miron Livny: Memory-Adaptive External Sorting. VLDB 1993: 618-629
[75]
Jignesh M. Patel, David J. DeWitt: Partition Based Spatial-Merge Join. SIGMOD Conf. 1996: 259-270
[76]
Sridhar Ramaswamy, Sairam Subramanian: Path Caching: A Technique for Optimal External Searching. PODS 1994: 25-35
[77]
Chris Ruemmler, John Wilkes: An Introduction to Disk Drive Modeling. IEEE Computer 27(3): 17-28(1994)
[78]
...
[79]
...
[80]
Hanan Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
[81]
...
[82]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518
[83]
...
[84]
...
[85]
...
[86]
...
[87]
Roberto Tamassia, Jeffrey Scott Vitter: Optimal Cooperative Search in Fractional Cascaded Data Structures. Algorithmica 15(2): 154-171(1996)
[88]
...
[89]
...
[90]
Jochen Van den Bercken, Bernhard Seeger, Peter Widmayer: A Generic Approach to Bulk Loading Multidimensional Index Structures. VLDB 1997: 406-415
[91]
...
[92]
Peter J. Varman, Rakesh M. Verma: An Efficient Multiversion Access STructure. TKDE 9(3): 391-409(1997)
[93]
...
[94]
...
[95]
...
[96]
...
[97]
Jeffrey Scott Vitter, Elizabeth A. M. Shriver: Algorithms for Parallel Memory I: Two-Level Memories. Algorithmica 12(2/3): 110-147(1994)
[98]
Jeffrey Scott Vitter, Elizabeth A. M. Shriver: Algorithms for Parallel Memory II: Hierarchical Multilevel Memories. Algorithmica 12(2/3): 148-169(1994)
[99]
...
[100]
...
[101]
...
[102]
Weiye Zhang, Per-Åke Larson: Dynamic Memory Adjustment for External Mergesort. VLDB 1997: 376-385 Referenced By:
  1. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459
BIBTEX

@inproceedings{DBLP:conf/pods/Vitter98,
author = {Jeffrey Scott Vitter},
title = {External Memory Algorithms},
booktitle = {Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium
on Principles of Database Systems, June 1-3, 1998, Seattle, Washington},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-966-3},
pages = {119-128},
crossref = {DBLP:conf/pods/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).