2003 Digital Symposium Collection

Operadores de Seleção por Similaridade para Sistemas de Gerenciamento de Bases de Dados Relacionais

Adriano S. Arantes, Marcos R. Vieira, Caetano Traina Jr., and Agma J. M. Traina
View Paper (PDF)

Return to Query Processing

Abstract

Searching operations in complex datasets are performed using comparison criteria based on similarity because equality comparison are barely useful and those based on the ordering relationships cannot be applied due to the nature of these datasets. There are two basic operators for similarity queries: Range Query and k-Nearest Neighbor Query. A great amount of research was done to achieve effective algorithms for those operators. However, algorithms that deal with these operators as parts of a more complex operation (compositions of them) were not developed yet. This article presents two new algorithms, named kAndRange and kOrRange, which are designed to answer conjunctions and disjunctions operations between those similarity criteria. The new algorithms were tested with sequential scan and with a metric access method called Slim-tree. The experimental results, performed with real and synthetic datasets, show that the new algorithms have better performance than the composition of the two operators to answer these complex similarity queries in all measured aspects, being up to 40 times faster. This is an essential point that will enable the practical use of similarity operators in Relational Database Systems.