2003 Digital Symposium Collection

Path Queries on Compressed XML

Peter Buneman, Martin Grohe, and Christoph Koch
View Paper (PDF)

Return to XML Queries Processing I (Session A4)

Abstract

Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree struc- ture can be effectively compressed and ma- nipulated using techniques derived from sym- bolic model checking. Specifically, we show first that succinct representations of document tree structures based on sharing subtrees are highly effective. Second, we show that com- pressed structures can be queried directly and efficiently through a process of manipulating selections of nodes and partial decompression. We study both the theoretical and experimen- tal properties of this technique and provide algorithms for querying our compressed in- stances using node-selecting path query lan- guages such as XPath. We believe the ability to store and manipulate large portions of the structure of very large XML documents in main memory is crucial to the development of efficient, scalable native XML databases and query engines.