
























|
 |
|
On Effective Multi-Dimensional Indexing for Strings
|
 |
H. V. Jagadish,
Nick Koudas, and
Divesh Srivastava
Return to Research Sessions
 |
|
Abstract
|
 |
As databases have expanded in scope from storing purely business data to include XML documents, product catalogs, e-mail messages, and directory data, it has become increasingly important to search databases based on wild-card string matching: prefix matching, for example, is more common (and useful) than exact matching, for such data. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. Traditional multi-dimensional index structures, designed with (fixed length) numeric data in mind, are not suitable for matching unbounded length string data.
In this paper, we describe a general technique for adapting a multi-dimensional index structure for wild-card indexing of unbounded length string data. The key ideas are (a) a carefully developed mapping function from strings to rational numbers, (b) representing an unbounded length string in an index leaf page by a fixed length offset to an external key, and (c) storing multiple elided tries, one per dimension, in an index page to prune search during traversal of index pages. These basic ideas affect all index algorithms. In this paper, we present efficient algorithms for different types of string matching.
While our technique is applicable to a wide range of multi-dimensional index structures, we instantiate our generic techniques by adapting the 2-dimensional R-tree to string data. We demonstrate the space effectiveness and time benefits of using the string R-tree both analytically and experimentally.
 |
|
References
|
 |
Note: References link to DBLP on the Web.
-
[1]
-
Lars Arge
,
Paolo Ferragina
,
Roberto Grossi
,
Jeffrey Scott Vitter
: On Sorting Strings in External Memory (Extended Abstract).
STOC 1997
: 540-548
-
[2]
-
Rudolf Bayer
,
Karl Unterauer
: Prefix B-Trees.
TODS 2(1)
: 11-26(1977)
-
[3]
-
Norbert Beckmann
,
Hans-Peter Kriegel
,
Ralf Schneider
,
Bernhard Seeger
: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.
SIGMOD Conference 1990
: 322-331
-
[4]
-
Paolo Ferragina
,
Roberto Grossi
: A Fully-Dynamic Data Structure for External Substring Search (Extended Abstract).
STOC 1995
: 693-702
-
[5]
-
Paolo Ferragina
,
Roberto Grossi
: Fast String Searching in Secondary Storage: Theoretical Developments And Experimental Results.
SODA 1996
: 373-382
-
[6]
-
Paolo Ferragina
,
Roberto Grossi
: The String B-tree: A New Data Structure for String Search in External Memory and Its Applications.
JACM 46(2)
: 236-280(1999)
-
[7]
-
Volker Gaede
,
Oliver Günther
: Multidimensional Access Methods.
Computing Surveys 30(2)
: 170-231(1998)
-
[8]
-
Gaston H. Gonnet
,
Ricardo A. Baeza-Yates
,
Tim Snider
: New Indices for Text: Pat Trees and Pat Arrays.
Information Retrieval: Data Structures & Algorithms 1992
: 66-82
-
[9]
-
Antonin Guttman
: R-Trees: A Dynamic Index Structure for Spatial Searching.
SIGMOD Conference 1984
: 47-57
-
[10]
-
...
-
[11]
-
H. V. Jagadish
,
Olga Kapitskaia
,
Raymond T. Ng
,
Divesh Srivastava
: Multi-Dimensional Substring Selectivity Estimation.
VLDB 1999
: 387-398
-
[12]
-
...
-
[13]
-
H. V. Jagadish
,
Laks V. S. Lakshmanan
,
Tova Milo
,
Divesh Srivastava
,
Dimitra Vista
: Querying Network Directories.
SIGMOD Conference 1999
: 133-144
-
[14]
-
...
-
[15]
-
Edward M. McCreight
: A Space-Economical Suffix Tree Construction Algorithm.
JACM 23(2)
: 262-272(1976)
-
[16]
-
Donald R. Morrison
: PATRICIA-Practical Algorithm To Retrieve Information Coded in Alphanumeric.
JACM 15(4)
: 514-534(1968)
-
[17]
-
John T. Robinson
: The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes.
SIGMOD Conference 1981
: 10-18
-
[18]
-
Nick Roussopoulos
,
Daniel Leifker
: Direct Spatial Search on Pictorial Databases Using Packed R-Trees.
SIGMOD Conference 1985
: 17-31
-
[19]
-
Hanan Samet
: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
-
[20]
-
Timos K. Sellis
,
Nick Roussopoulos
,
Christos Faloutsos
: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects.
VLDB 1987
: 507-518
-
[21]
-
Kenneth C. Sevcik
,
Nick Koudas
: Filter Trees for Managing Spatial Data over a Range of Size Granularities.
VLDB 1996
: 16-27
-
[22]
-
Udi Manber
,
Eugene W. Myers
: Suffix Arrays: A New Method for On-Line String Searches.
SIAM J. Comput. 22(5)
: 935-948(1993)
 |
|
BIBTEX
|
 |
@inproceedings{DBLP:conf/sigmod/JagadishKS00,
author = {H. V. Jagadish and
Nick Koudas and
Divesh Srivastava},
editor = {Weidong Chen and
Jeffrey F. Naughton and
Philip A. Bernstein},
title = {On Effective Multi-Dimensional Indexing for Strings},
booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
Management of Data, May 16-18, 2000, Dallas, Texas, USA},
journal = {SIGMOD Record},
publisher = {ACM},
volume = {29},
number = {2},
year = {2000},
isbn = {1-58113-218-2},
pages = {403-414},
crossref = {DBLP:conf/sigmod/2000},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
DiSC'01 Copyright ©2002 ACM Inc.
|