2002 Digital Symposium Collection

Modeling high-dimensional index structures using sampling

Christian A. Lang and Ambuj K. Singh
View Paper (PDF)

Return to Multi-dimensional Data

Abstract

A large number of index structures for highdimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicable to highdimensional data. We propose the use of sampling to predict the number of accessed index pages during a query execution. Sampling is independent of the dimensionality and preserves clusters which is important for representing skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implementa tion of our model under restricted memory assumptions and show that it performs well even under these constraints. Er rors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling.