Welcome to D
SIGMOD 2003
<<< = SIGMOD'03 Pa>>>
PODS 2003
SIGMOD-RECOR
ADBIS
CIDR 2003
CIKM 2003
DASFAA 2003
Data Enginee
DEBS
DMKD 2003
DOLAP 2003
DPDJ 2003
ER
GIS 2003
Hypertext 20
ICDE 2003
ICDM 2003
ICDT 2003
JCDL 2003
KRDB 2003
MIR 2003
MIS 2003
MMDB 2003
RIDE 2003
SBBD 2003
SIGIR 2003
SIGIR-FORUM
SIGKDD 2003
SIGKDD-EXP
SSDBM 2003
TIME 2003
TODS
VLDB 2003
VLDB Journal
WIDM 2003

Statistical Schema Matching across Web Query Interfaces


Bin He and Kevin Chen-Chuan Chang

  View Paper (PDF)  

Return to Meta-Data Management


Abstract

Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a different approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe two distinguishing characteristics that offer a new view for considering schema matching: First, as the Web scales, there are ample sources that provide structured information in the same domains (e.g., books and automobiles). Second, while sources proliferate, their aggregate schema vocabulary tends to converge at a relatively small size. Motivated by these observations, we propose a new paradigm, statistical schema matching: Unlike traditional approaches using pairwise-attribute correspondence, we take a holistic approach to match all input schemas by finding an underlying generative schema model. We propose a general statistical framework MGS for such hidden model discovery, which consists of hypothesis modeling, generation, and selection. Further, we specialize the general framework to develop Algorithm MGSsd, targeting at synonym discovery, a canonical problem of schema matching, by designing and discovering a model that speci cally captures synonym attributes. We demonstrate our approach over hundreds of real Web sources in four domains and the results show good accuracy.

BIBTEX


@inproceedings       {DBLP:conf/sigmod/HeC03,
  author    = {Bin He and
                Kevin Chen-Chuan Chang},
   booktitle = {SIGMOD Conference},
   title     = {Statistical Schema Matching across Web Query Interfaces.},
   pages     = {217-228},
   year      = {2003},
   url       = {db/conf/sigmod/sigmod2003.html#HeC03},
   ee        = {http://www.acm.org/sigmod/sigmod03/eproceedings/papers/r08p03.pdf},
   crossref  = {conf/sigmod/2003},
   bibsource = {DBLP, http://dblp.uni-trier.de} 
}



©2004 Association for Computing Machinery