Welcome to DiSC 2002
SIGMOD 2001
 = SIGMOD'01 Website
 = SIGMOD/PODS'01 Plena
<<< = SIGMOD'01 Papers>>>
 = Demos
 = Industrial Sessions
 = Panels
 = Tutorials
PODS 2001
 SIGMOD RECORD 2001
CIKM 2001
CoopIS 2001
DASFAA 2001
DASFAA 2000
DBPL 2001
Data Engineering Bul
DEXA_EC-WEB 2001
DMKD 2001
 DPDJ 2001
HYPERTEXT 2001
ICDE 2001
ICDM 2001
ICDT 2001
JCDL 2001
KDD 2001
 KDD_EXPLORATIONS 20
KRDB 2001
MDM 2001
MIR 2001
MIS 2001
RIDE 2001
SBBD 2001
 SIGIR 2001
 SIGIR FORUM 2001
SSDBM 2001
SSTD 2001
TODS 2001
TIME 2001
VLDB 2001
VLDBJ 2001

Modeling high-dimensional index structures using sampling


Christian A. Lang and Ambuj K. Singh

  View Paper (PDF)  

Return to Multi-dimensional Data


Abstract

A large number of index structures for high­dimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicable to high­dimensional data. We propose the use of sampling to predict the number of accessed index pages during a query execution. Sampling is independent of the dimensionality and preserves clusters which is important for representing skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implementa­ tion of our model under restricted memory assumptions and show that it performs well even under these constraints. Er­ rors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling.


DiSC'02 © 2003 Association for Computing Machinery