Welcome to DiSC 2002
SIGMOD 2001
 = SIGMOD'01 Website
 = SIGMOD/PODS'01 Plena
<<< = SIGMOD'01 Papers>>>
 = Demos
 = Industrial Sessions
 = Panels
 = Tutorials
PODS 2001
 SIGMOD RECORD 2001
CIKM 2001
CoopIS 2001
DASFAA 2001
DASFAA 2000
DBPL 2001
Data Engineering Bul
DEXA_EC-WEB 2001
DMKD 2001
 DPDJ 2001
HYPERTEXT 2001
ICDE 2001
ICDM 2001
ICDT 2001
JCDL 2001
KDD 2001
 KDD_EXPLORATIONS 20
KRDB 2001
MDM 2001
MIR 2001
MIS 2001
RIDE 2001
SBBD 2001
 SIGIR 2001
 SIGIR FORUM 2001
SSDBM 2001
SSTD 2001
TODS 2001
TIME 2001
VLDB 2001
VLDBJ 2001

On supporting containment queries in relational database management systems


Chun Zhang, Jeffrey F. Naughton, David J. DeWitt, Qiong Luo, and Guy M. Lohman

  View Paper (PDF)  

Return to XML


Abstract

Virtually all proposals for querying XML include a class of query we term "containment queries". It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under certain conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementation in an RDBMS can support this class of query much more efficiently.


DiSC'02 © 2003 Association for Computing Machinery