Welcome to D
SIGMOD 2003
PODS 2003
SIGMOD-RECOR
ADBIS
CIDR 2003
CIKM 2003
DASFAA 2003
Data Enginee
DEBS
DMKD 2003
DOLAP 2003
DPDJ 2003
ER
GIS 2003
Hypertext 20
ICDE 2003
ICDM 2003
ICDT 2003
JCDL 2003
KRDB 2003
MIR 2003
MIS 2003
MMDB 2003
RIDE 2003
SBBD 2003
SIGIR 2003
SIGIR-FORUM
SIGKDD 2003
SIGKDD-EXP
SSDBM 2003
TIME 2003
TODS
VLDB 2003
VLDB Journal
WIDM 2003
<<< = WIDM'03 Pape>>>

Finding similar identities among objects from multiple web sources


Joyce C. P. Carvalho and Altigran Soares da Silva

  View Paper (PDF)  

Return to XML and information integration


Abstract

When integrating data from multiple Web sources, objects can exist in different formats and structures, making it difficult to identify those that can be matched together. In this paper, we propose an identification approach to finding similar identities among objects from multiple Web sources. In this approach, object identification works like the relational join operation where a similarity function takes the place of the equality condition. This similarity function is based on information retrieval techniques. Our approach differs from others in the literature since it can be used to identify objects more complexly structured (e.g., XML documents) and not only objects with a flat structure such as relations. The effectiveness of our approach is demonstrated by experimental results with real Web data sources from different domains, that reach precision levels above 75%.


©2004 Association for Computing Machinery