DiSC - VLDB'98 Papers

Hash Joins and Hash Teams in Microsoft SQL Server

Goetz Graefe, Ross Bunker, Shaun Cooper

Full Paper (PDF)

Abstract

The query execution engine in Microsoft SQL Server employs hash-based algorithms for inner and outer joins, semi-joins, set operations (such as intersection), grouping, and duplicate removal. The implementation combines many techniques proposed individually in the research literature but never combined in a single implementation, neither in a product nor in a research prototype. One of the paper's contributions is a design that cleanly integrates most existing techniques. One technique, however, which we call hash teams and which has previously been described only in vague terms, has not been implemented in prior research or product work. It realizes in hash-based query processing many of the benefits of interesting orderings in sort-based query processing. Moreover, we describe how memory is managed in complex and bushy query evaluation plans with multiple sort and hash operations. Finally, we report on the effectiveness of hashing using two very typical database queries, including the performance effects of hash teams.

References

References, where available, link to the DBLP on the World Wide Web.

[Bratbergsengen 1984]

Kjell Bratbergsengen: Hashing Methods and Relational Algebra Operations. VLDB 1984: 323-333

[DeWitt 1984]

David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8

[DeWitt 1985]

David J. DeWitt, Robert H. Gerber: Multiprocessor Hash-Based Join Algorithms. VLDB 1985: 151-164

[DeWitt 1993]

David J. DeWitt, Jeffrey F. Naughton, J. Burger: Nested Loops Revisited. PDIS 1993: 230-242

[Fushimi 1986]

Shinya Fushimi, Masaru Kitsuregawa, Hidehiko Tanaka: An Overview of The System Software of A Parallel Relational Database Machine GRACE. VLDB 1986: 209-219

[Graefe 1993]

Goetz Graefe: Query Evaluation Techniques for Large Databases. Computing Surveys 25(2): 73-170(1993)

[Graefe 1994]

Goetz Graefe: Sort-Merge-Join: An Idea whose time has(h) passed? ICDE 1994: 406-417

[Gray 1987]

Jim Gray, Gianfranco R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD Conference 1987: 395-398

[Hellerstein 1996]

Joseph M. Hellerstein, Jeffrey F. Naughton: Query Execution Techniques for Caching Expensive Methods. SIGMOD Conf. 1996: 423-434

[Kitsuregawa 1989]

Masaru Kitsuregawa, Masaya Nakayama, Mikio Takagi: The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method. VLDB 1989: 257-266

[Red Brick 1996]

...

[Sacco 1986]

Giovanni Maria Sacco: Fragmentation: A Technique for Efficient Query Processing. TODS 11(2): 113-133(1986)

[Schneider 1990]

Donovan A. Schneider, David J. DeWitt: Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. VLDB 1990: 469-480

[Sedgewick 1984]

...

[Selinger 1979]

Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34

[Zeller 1990]

Hansjörg Zeller, Jim Gray: An Adaptive Hash Join Algorithm for Multiuser Environments. VLDB 1990: 186-197

BIBTEX

@inproceedings{DBLP:conf/vldb/GraefeBC98,
author = {Goetz Graefe and
Ross Bunker and
Shaun Cooper},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {Hash Joins and Hash Teams in Microsoft SQL Server},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {86-97},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}