DiSC - SIGMOD Paper Index

Similarity Query Processing Using Disk Arrays

Apostolos Papadopoulos, Yannis Manolopoulos

Full Paper (PDF)
Slides (PDF)

Abstract

Similarity queries are fundamental operations that are used extensively in many modern applications, whereas disk arrays are powerful storage media of increasing importance. The basic trade-off in similarity query processing in such a system is that increased parallelism leads to higher resource consumptions and low throughput, whereas low parallelism leads to higher response times. Here, we propose a technique which is based on a careful investigation of the currently available data in order to exploit parallelism up to a point, retaining low response times during query processing. The underlying access method is a variation of the R*-tree, which is distributed among the components of a disk array, whereas the system is simulated using event-driven simulation. The performance results conducted, demonstrate that the proposed approach outperforms by factors a previous branch-and-bound algorithm and a greedy algorithm which maximizes parallelism as much as possible. Moreover, the comparison of the proposed algorithm to a hypothetical (non-existing) optimal one (with respect to the number of disk accesses) shows that the former is on average two times slower than the latter.

References

References, where available, link to the DBLP on the World Wide Web.

[1]

Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331

[2]

Alberto Belussi, Christos Faloutsos: Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995: 299-310

[3]

Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39

[4]

Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86

[5]

Stefan Berchtold, Christian Böhm, Bernhard Braunmüller, Daniel A. Keim, Hans-Peter Kriegel: Fast Similarity Search in Multimedia Databases. SIGMOD Conference 1997: 1-12

[6]

Peter M. Chen, Edward L. Lee, Garth A. Gibson, Randy H. Katz, David A. Patterson: RAID: High-Performance, Reliable Secondary Storage. Computing Surveys 26(2): 145-185(1994)

[7]

Christos Faloutsos, Ibrahim Kamel: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. PODS 1994: 4-13

[8]

Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos: Fast Subsequence Matching in Time-Series Databases. SIGMOD Conference 1994: 419-429

[9]

...

[10]

Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57

[11]

Ibrahim Kamel, Christos Faloutsos: Parallel R-trees. SIGMOD Conference 1992: 195-204

[12]

Ibrahim Kamel, Christos Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. VLDB 1994: 500-509

[13]

Norio Katayama, Shin'ichi Satoh: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. SIGMOD Conference 1997: 369-380

[14]

King-Ip Lin, H. V. Jagadish, Christos Faloutsos: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal 3(4): 517-542(1994)

[15]

...

[16]

Bernd-Uwe Pagel, Hans-Werner Six, Heinrich Toben, Peter Widmayer: Towards an Analysis of Range Query Performance in Spatial Data Structures. PODS 1993: 214-221

[17]

Apostolos Papadopoulos, Yannis Manolopoulos: Performance of Nearest Neighbor Queries in R-Trees. ICDT 1997: 394-408

[18]

David A. Patterson, Garth A. Gibson, Randy H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID). SIGMOD Conference 1988: 109-116

[19]

Nick Roussopoulos, Stephen Kelley, Frédéic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79

[20]

Chris Ruemmler, John Wilkes: An Introduction to Disk Drive Modeling. IEEE Computer 27(3): 17-28(1994)

[21]

Bernhard Seeger, Per-Åke Larson: Multi-Disk B-trees. SIGMOD Conference 1991: 436-445

[22]

Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518

[23]

Michael Stonebraker, James Frew, Kenn Gardels, Jeff Meredith: The Sequoia 2000 Benchmark. SIGMOD Conference 1993: 2-11

[24]

Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171

[25]

...

[26]

David A. White, Ramesh Jain: Similarity Indexing with the SS-tree. ICDE 1996: 516-523

[27]

Yvonne Zhou, Shashi Shekhar, Mark Coyle: Disk Allocation Methods for Parallelizing Grid Files. ICDE 1994: 243-252

BIBTEX

@inproceedings{DBLP:conf/sigmod/PapadopoulosM98,
author = {Apostolos Papadopoulos and
Yannis Manolopoulos},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Similarity Query Processing Using Disk Arrays},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-955-5},
pages = {225-236},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}