![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Advanced Query Processing (Session C5) Applications in which plain text coexists with structured data are pervasive. Commercial rela- tional database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art infor- mation retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched. This requirement can be cumbersome and inflex- ible from a user perspective: good answers to a keyword query might need to be "assembled" - in perhaps unforeseen ways - by joining tuples from multiple relations. This observation has motivated recent research on free-form keyword search over RDBMSs. In this paper, we adapt IR-style document-relevance ranking strategies to the problem of processing free-form keyword queries over RDBMSs. Our query model can handle queries with both AND and OR seman- tics, and exploits the sophisticated single-column text-search functionality often available in com- mercial RDBMSs. We develop query-processing strategies that build on a crucial characteristic of IR-style keyword search: only the few most rel- evant matches - according to some definition of "relevance" - are generally of interest. Conse- quently, rather than computing all matches for a keyword query, which leads to inefficient execu- tions, our techniques focus on the top-k matches for the query, for moderate values of k. A thor- ough experimental evaluation over real data shows the performance advantages of our approach. ![]() ©2004 Association for Computing Machinery |