Welcome to D
SIGMOD 2003
PODS 2003
SIGMOD-RECOR
ADBIS
CIDR 2003
<<< = CIDR'03 Pape>>>
CIKM 2003
DASFAA 2003
Data Enginee
DEBS
DMKD 2003
DOLAP 2003
DPDJ 2003
ER
GIS 2003
Hypertext 20
ICDE 2003
ICDM 2003
ICDT 2003
JCDL 2003
KRDB 2003
MIR 2003
MIS 2003
MMDB 2003
RIDE 2003
SBBD 2003
SIGIR 2003
SIGIR-FORUM
SIGKDD 2003
SIGKDD-EXP
SSDBM 2003
TIME 2003
TODS
VLDB 2003
VLDB Journal
WIDM 2003

Crossing the Structure Chasm


Alon Y. Halevy, Oren Etzioni, AnHai Doan, Zachary G. Ives, Jayant Madhavan, Luke McDowell, and Igor Tatarinov

  View Paper (PDF)  

Return to Peer-to-Peer System


Abstract

It has frequently been observed that most of the world's data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructured realm to others. The world of unstructured data has several very appealing properties, such as ease of authoring, querying and data sharing. In contrast, authoring, querying and sharing structured data require significant effort, albeit with the benefit of rich query languages and exact answers. We argue that in order to broaden the use of data management tools, we need a concerted effort to cross this structure chasm, by importing the attractive properties of the unstructured world into the structured one. As an initial effort in this direction, we introduce the REVERE System, which offers several mechanisms for crossing the structure chasm, and considers as its first application the chasm on the WWW. REVERE includes three innovations: (1) a data creation environment that entices people to structure data and enables them to do it rapidly; (2) a data sharing environment, based on a peer data management system, in which a web of data is created by establishing local mappings between schemas, and query answering is done over the transitive closure of these mappings; (3) a novel set of tools that are based on computing statistics over corpora of schemata and structured data. In a sense, we are trying to adapt the key techniques of the unstructured world, namely computing statistics over text coropra, into the world of structured data. We sketch how statistics computed over such corpora, which capture common term usage patterns, can be used to create tools for assisting in schema and mapping development. The initial application of REVERE focuses on creating a web of structured data from data that is usually stored in HTML web pages (e.g., personal information, course information, etc.).

BIBTEX


@inproceedings       {DBLP:conf/cidr/HalevyED03,
  author    = {Alon Y. Halevy and
                Oren Etzioni and
                AnHai Doan and
                Zachary G. Ives and
                Jayant Madhavan and
                Luke McDowell and
                Igor Tatarinov},
   booktitle = {CIDR},
   title     = {Crossing the Structure Chasm.},
   year      = {2003},
   url       = {db/conf/cidr/cidr2003.html#HalevyED03},
   ee        = {http://www-db.cs.wisc.edu/cidr/program/p11.pdf},
   bibsource = {DBLP, http://dblp.uni-trier.de} 
}



©2004 Association for Computing Machinery