Welcome to D
SIGMOD'00
PODS'00
SIGMOD Recor
CIKM 2000/CI
COMAD 2000
Data Enginee
DL 2000
DPDJ
EDBT 2000
Hypertext 20
ICDE 2000
<<< = ICDE'00 Pape>>>
KDD 2000
KDD Explorat
KRDB 2000
SBBD 2000
SIGIR 2000
SIGIR Forum
SSDBM 2000
TODS
VLDB'00
VLDBJ

PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces


P. Ciaccia and M. Patella

  View Paper (PDF)  

Return to Multimedia Retrieval


Abstract


PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures - the so-called "curse of dimensionality". This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result. Keywords: Nearest Neighbor Search, Metric Spaces, Curse of Dimensionality, Approximate Queries, Distance Distribution



DiSC'01 Copyright ©2002 ACM Inc.