Interaction of Query Evaluation and Buffer Management for Information Retrieval
Bjorn Por Jonsson, Michael J. Franklin, Divesh Srivastava
Full Paper (PDF)

Slides (HTML)

Abstract
The proliferation of the World Wide Web has brought information retrieval (IR) techniques to the forefront of search technology. To the average computer user, "searching" now means using IR-based systems for finding information on the WWW or in other document collections. IR query evaluation methods and workloads differ significantly from those found in database systems. In this paper, we focus on three such differences. First, due to the inherent fuzziness of the natural language used in IR queries and documents, an additional degree of flexibility is permitted in evaluating queries. Second, IR query evaluation algorithms tend to have access patterns that cause problems for traditional buffer replacement policies. Third, IR search is often an iterative process, in which a query is repeatedly refined and resubmitted by the user. Based on these differences, we develop two complementary techniques to improve the efficiency of IR queries: 1) Buffer-aware query evaluation, which alters the query evaluation process based on the current contents of buffers; and 2) Ranking-aware buffer replacement, which incorporates knowledge of the query processing strategy into replacement decisions. In a detailed performance study we show that using either of these techniques yields significant performance benefits and that in many cases, combining them produces even further improvements.

References

References, where available, link to the DBLP on the World Wide Web.

[ABGM90]
Rafael Alonso, Daniel Barbará, Hector Garcia-Molina: Data Caching Issues in an Information Retrieval System. TODS 15(3): 359-384(1990)
[Bro95]
Eric W. Brown: Fast Evaluation of Structured Queries for Information Retrieval. SIGIR 1995: 30-38
[Bro97]
...
[CD85]
Hong-Tai Chou, David J. DeWitt: An Evaluation of Buffer Management Strategies for Relational Database Systems. VLDB 1985: 127-141
[CK97]
Michael J. Carey, Donald Kossmann: On Saying "Enough Already!" in SQL. SIGMOD Conference 1997: 219-230
[CR93]
Chungmin Melvin Chen, Nick Roussopoulos: Adaptive Database Buffer Allocation Using Query Feedback. VLDB 1993: 342-353
[DFJ+96]
Shaul Dar, Michael J. Franklin, Björn Þór Jónsson, Divesh Srivastava, Michael Tan: Semantic Data Caching and Replacement. VLDB 1996: 330-341
[EH84]
Wolfgang Effelsberg, Theo Härder: Principles of Database Buffer Management. TODS 9(4): 560-595(1984)
[Fal85]
Christos Faloutsos: Access Methods for Text. Computing Surveys 17(1): 49-74(1985)
[Fid91]
...
[FJK96]
Michael J. Franklin, Björn Þór Jónsson, Donald Kossmann: Performance Tradeoffs for Client-Server Query Processing. SIGMOD Conf. 1996: 149-160
[Fox92]
...
[Fra92]
...
[Har96]
...
[HHW97]
Joseph M. Hellerstein, Peter J. Haas, Helen Wang: Online Aggregation. SIGMOD Conference 1997: 171-182
[Ink]
...
[JS94]
Theodore Johnson, Dennis Shasha: 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. VLDB 1994: 439-450
[KK94]
Alfons Kemper, Donald Kossmann: Dual-Buffering Strategies in Object Bases. VLDB 1994: 427-438
[KQCB94]
Jürgen Koenemann, Richard Quatrain, Colleen Cool, Nicholas J. Belkin: New Tools and Old Habits: The Interactive Searching Behavior of Expert Online Searches using INQUERY. TREC 1994: 0-
[MZ94]
Alistair Moffat, Justin Zobel: Fast Ranking in Limited Space. ICDE 1994: 428-437
[NFS91]
Raymond T. Ng, Christos Faloutsos, Timos K. Sellis: Flexible Buffer Allocation Based on Marginal Gains. SIGMOD Conference 1991: 387-396
[OOW93]
Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD Conference 1993: 297-306
[Per94]
Michael Persin: Document Filtering for Fast Ranking. SIGIR 1994: 339-348
[PZSD96]
Michael Persin, Justin Zobel, Ron Sacks-Davis: Filtered Document Retrieval with Frequency-Sorted Indexes. JASIS 47(10): 749-764(1996)
[SA87]
Patricia Simpson, Rafael Alonso: Data Caching in Information Retrieval Systems. SIGIR 1987: 296-305
[SB88]
Gerard Salton, Chris Buckley: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5): 513-523(1988)
[SB90]
...
[SS96]
Sunita Sarawagi, Michael Stonebraker: Reordering Query Execution in Tertiary Memory Databases. VLDB 1996: 156-167
[Sto81]
Michael Stonebraker: Operating System Support for Database Management. CACM 24(7): 412-418(1981)
[TF95]
Howard R. Turtle, James Flood: Query Evaluation: Strategies and Optimizations. Information Processing and Management 31(6): 831-850(1995)
[TGM93a]
Anthony Tomasic, Hector Garcia-Molina: Caching and Database Scaling in Distributed Shard-Nothing Information Retrieval Systems. SIGMOD Conference 1993: 129-138
[TGM93b]
Anthony Tomasic, Hector Garcia-Molina: Performance of Inverted Indices in Distributed Text Document Retrieval Systems. PDIS 1993: 8-17
[Tra95]
...
[Tur94]
Howard R. Turtle: Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance. SIGIR 1994: 212-220
[VH97]
...
[WL93]
Wai Yee Peter Wong, Dik Lun Lee: Implementations of Partial Document Ranking Using Inverted Files. Information Processing and Management 29(5): 647-669(1993)
[ZMSD92]
Justin Zobel, Alistair Moffat, Ron Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352-362
BIBTEX

@inproceedings{DBLP:conf/sigmod/JonssonFS98,
author = {Bj{\"o}rn TH{\'o}r J{\'o}nsson and
Michael J. Franklin and
Divesh Srivastava},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Interaction of Query Evaluation and Buffer Management for Information
Retrieval},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-955-5},
pages = {118-129},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).