2006 Digital Symposium Collection

Persistent Clustered Main Memory Index for Accelerating k-NN Queries on High Dimensional Datasets

Lijuan Zhang and Alexander Thomasian
View Paper (PDF)

Return to Paper Session III: High-Dimensional Data Analysis and Indexing

Abstract

Similarity search implemented via k-Nearest-Neighbor (k-NN) queries is an extremely useful paradigm in content based image retrieval (CBIR), which is costly on high-dimensional indices due to the curse of dimensionality. We improve k-NN query processing by utilizing the double filltering effect of clustering and indexing on a persistent version of the Ordered-Partition tree (OP-tree) index, which is highly effficient in processing k-NN queries. The OP-tree is made persistent by writing it onto disk after serialization, i.e. arranging its nodes into contiguous memory locations, so that the high transfer rate of modern disk drives is exploited. We first report experimental results to optimize OP-tree parameters. We then compare OP-trees and sequential scans with options for the Karhunen transform and Euclidean distance calculation. Comparisons against OMNI-based sequential scan are also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-File method. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.