2002 Digital Symposium Collection

VQBD: exploring semistructured data

Sudarshan S. Chawathe, Thomas Baby, and Jihwang Yeo
View Paper (PDF)

Return to Demostrations

Abstract

The VQBD (``veecubed'') project addresses the following problem: What is the best way to explore an XML document of unknown structure and content? We use data exploration to denote the interactive task of gathering the information needed to use data for purposes such as generating a re port, writing queries, building user interfaces, and writing applications. We focus on XML documents that are too large to browse in their entirety, even with the assistance of prettyprinting software (e.g., multimegabyte or larger XML documents). In a relational or object database, the schema (e.g., table definitions, class definitions, integrity constraints, and stored procedures) provides some of the information necessary for writing queries and applications. However, the schema is rarely sufficient for these tasks. Typ ically, one must probe and browse the database to discover data coverage, typical and exceptional values, and other in formation required to gain a better understanding of the database. In an XML environment, the need for such data exploration is much greater because it is quite likely that the XML data of interest is not accompanied by a schema. In deed, much XML data is semistructured, meaning its struc ture is irregular, incomplete, and frequently changing. The rapid adoption of XML as a data exchange standard makes this semistructured data exploration problem increasingly important. The VQBD system allows the structured ex ploration of arbitrary XML data. We describe some key features very briefly below; a detailed description appears at http://www.cs.umd.edu/projects/vqbd/.