|














|
|
 |
|
 |
|
Index Design for Structured Documents Based on Abstraction
|
Jyh-Herng Chow,
Josephine Cheng,
Daniel Chang, and
Jane Xu
View Paper (PDF)
Return to Session 2A: Document Retrieval
HTML has been the standard format for delivering information on the web.
However, automated information processing on these documents for data exchange
and interoperability has been difficult. XML, a subset of SGML, has been
proposed to be the next standard format that allows user-defined tags for better
describing nested document structures and associated semantics. Operations on
structured documents, such as searching in nested document structures, require
new functions not currently available on most systems today. We describe a
general framework for manipulating structured documents based on document
abstractions. An abstraction is an approximation of an actual document, while
possessing useful properties for analyses of interest. The framework provides a
wide design space for tradeoff between cost and capability. This general
framework can be applied to index design, document searching, and
categorizations.
We present this framework by focusing on indexing and
searching of structured documents in the XML domain, and prove their soundness.
We also address the issues of rich data types in XML documents.
Copyright(C) 2000 ACM
|
|
|
|
|
|
|