Welcome to D
SIGMOD 2003
PODS 2003
<<< = PODS'03 Pape>>>
SIGMOD-RECOR
ADBIS
CIDR 2003
CIKM 2003
DASFAA 2003
Data Enginee
DEBS
DMKD 2003
DOLAP 2003
DPDJ 2003
ER
GIS 2003
Hypertext 20
ICDE 2003
ICDM 2003
ICDT 2003
JCDL 2003
KRDB 2003
MIR 2003
MIS 2003
MMDB 2003
RIDE 2003
SBBD 2003
SIGIR 2003
SIGIR-FORUM
SIGKDD 2003
SIGKDD-EXP
SSDBM 2003
TIME 2003
TODS
VLDB 2003
VLDB Journal
WIDM 2003

Data exchange: getting to the core


Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa

  View Paper (PDF)  

Return to Data Integration


Abstract

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier paper, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a most general possible solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a best universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we show that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. After this, we investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P is equal to NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. Finally, we analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core of G? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem.

BIBTEX


@inproceedings       {DBLP:conf/pods/FaginKP03,
  author    = {Ronald Fagin and
                Phokion G. Kolaitis and
                Lucian Popa},
   booktitle = {PODS},
   title     = {Data exchange: getting to the core.},
   pages     = {90-101},
   year      = {2003},
   url       = {db/conf/pods/pods2003.html#FaginKP03},
   ee        = {http://doi.acm.org/10.1145/773153.773163},
   crossref  = {conf/pods/2003},
   bibsource = {DBLP, http://dblp.uni-trier.de} 
}



©2004 Association for Computing Machinery