 |


















|
|
Providing Database-like Access to the Web Using Queries Based on Textual Similarity | Full Paper (PDF) Demonstration (HTML)
|
Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. Here we assume instead that the names are given in natural language text. We then propose a logic for database integration called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. An implemented data integration system based on WHIRL has been used to successfully integrate information from several dozen Web sites in two domains. |
References, where available, link to the DBLP on the World Wide Web.
[Abiteboul and Vianu, 1997]Serge Abiteboul, Victor Vianu:
Regular Path Queries with Constraints.
PODS 1997: 122-133[Arens et al., 1996]...
[Atzeni et al., 1997]...
[Barbara et al., 1992]Daniel Barbará, Hector Garcia-Molina, Daryl Porter:
The Management of Probabilistic Data.
TKDE 4(5): 487-502(1992)[Bartell et al., 1994]Brian T. Bartell, Garrison W. Cottrell, Richard K. Belew:
Automatic Combination of Multiple Ranked Retrieval Systems.
SIGIR 1994: 173-181[Bayardo et al., 1997]Roberto J. Bayardo Jr., Bill Bohrer, Richard S. Brice, Andrzej Cichocki, Jerry Fowler, Abdelsalam Helal, Vipul Kashyap, Tomasz Ksiezyk, Gale Martin, Marian H. Nodine, Mosfeq Rashid, Marek Rusinkiewicz, Ray Shea, C. Unnikrishnan, Amy Unruh, Darrell Woelk:
InfoSleuth: Semantic Integration of Information in Open and Dynamic Environments (Experience Paper).
SIGMOD Conference 1997: 195-206[Boyan et al., 1994]...
[Chaudhuri et al., 1995]Surajit Chaudhuri, Umeshwar Dayal, Tak W. Yan:
Join Queries with External Text Sources: Execution and Optimization Techniques.
SIGMOD Conference 1995: 410-422[Cohen and Singer, 1996]William W. Cohen, Yoram Singer:
Context-sensitive Learning Methods for Text Categorization.
SIGIR 1996: 307-315[Cohen et al., 1997]...
[Cohen, 1997a]...
[Cohen, 1997b]...
[Duschka and Genesereth, 1997a]Oliver M. Duschka, Michael R. Genesereth:
Answering Recursive Queries Using Views.
PODS 1997: 109-116[Duschka and Genesereth, 1997b]...
[Fang et al., 1994]...
[Felligi and Sunter, 1969]...
[Fiebig et al., 1997]...
[Fuhr, 1995]Norbert Fuhr:
Probabilistic Datalog - A Logic For Powerful Retrieval Methods.
SIGIR 1995: 282-290[Garcia-Molina et al., 1995]Hector Garcia-Molina, Dallan Quass, Yannis Papakonstantinou, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom:
The TSIMMIS Approach to Mediation: Data Models and Languages.
NGITS 1995: 0-[Hernandez and Stolfo, 1995]Mauricio A. Hernández, Salvatore J. Stolfo:
The Merge/Purge Problem for Large Databases.
SIGMOD Conference 1995: 127-138[Huffman and Steier, 1995]...
[Kilss and Alvey, 1985]...
[Knuth, 1975]Donald E. Knuth:
The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition.
Addison-Wesley 1973
[Konopnicki and Shmueli, 1995]David Konopnicki, Oded Shmueli:
W3QS: A Query System for the World-Wide Web.
VLDB 1995: 54-65[Korf, 1993]Richard E. Korf:
Linear-Space Best-First Search.
Artificial Intelligence 62(1): 41-78(1993)[Levy et al., 1996a]Alon Y. Levy, Anand Rajaraman, Joann J. Ordille:
Querying Heterogeneous Information Sources Using Source Descriptions.
VLDB 1996: 251-262[Levy et al., 1996b]Alon Y. Levy, Anand Rajaraman, Joann J. Ordille:
Query-Answering Algorithms for Information Agents.
AAAI/IAAI, Vol. 1 1996: 40-47[Lewis, 1992]...
[Mendelzon and Milo, 1997]Alberto O. Mendelzon, Tova Milo:
Formal Models of Web Queries.
PODS 1997: 134-143[Monge and Elkan, 1996]Alvaro E. Monge, Charles Elkan:
The Field Matching Problem: Algorithms and Applications.
KDD 1996: 267-270[Monge and Elkan, 1997]Alvaro E. Monge, Charles Elkan:
An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records.
DMKD 1997: 0-[Newcombe et al, 1959]...
[Nilsson, 1987]...
[Porter, 1980]...
[Quinlan, 1990]...
[Salton, 1989]Gerard Salton:
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8
[Schäuble, 1993]Peter Schäuble:
SPIDER: A Multiuser Information Retrieval System for Semistructured and Dynamic Data.
SIGIR 1993: 318-327[Suciu, 1996]Dan Suciu:
Query Decomposition and View Maintenance for Query Languages for Unstructured Data.
VLDB 1996: 227-238[Suciu, 1997]...
[Tomasic et al., 1997]Anthony Tomasic, Rémy Amouroux, Philippe Bonnet, Olga Kapitskaia, Hubert Naacke, Louiqa Raschid:
The Distributed Information Search Component (Disco) and the World Wide Web.
SIGMOD Conference 1997: 546-548[Turtle and Flood, 1995]Howard R. Turtle, James Flood:
Query Evaluation: Strategies and Optimizations.
Information Processing and Management 31(6): 831-850(1995)
|
@inproceedings{DBLP:conf/sigmod/Cohen98a, author = {William W. Cohen}, editor = {Laura M. Haas and Ashutosh Tiwary}, title = {Providing Database-like Access to the Web Using Queries Based on Textual Similarity}, booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA}, publisher = {ACM Press}, year = {1998}, isbn = {0-89791-955-5}, pages = {558-560}, crossref = {DBLP:conf/sigmod/98}, bibsource = {DBLP, http://dblp.uni-trier.de} }
|
DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).
|
|