Rapper: A Wrapper Generator with Linguistic Knowledge.
David Mattox, Leonard J. Seligman, Kenneth Smith:
Rapper: A Wrapper Generator with Linguistic Knowledge.
Workshop on Web Information and Data Management 1999: 6-11@inproceedings{DBLP:conf/widm/MattoxSS99,
author = {David Mattox and
Leonard J. Seligman and
Kenneth Smith},
editor = {Cyrus Shahabi},
title = {Rapper: A Wrapper Generator with Linguistic Knowledge},
booktitle = {ACM CIKM'99 2nd Workshop on Web Information and Data Management
(WIDM'99), Kansas City, Missouri, USA, November 5-6, 1999},
publisher = {ACM},
year = {1999},
pages = {6-11},
ee = {db/conf/widm/MattoxSS99.html, http://doi.acm.org/10.1145/319759.319766},
crossref = {DBLP:conf/widm/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form.
Often, wrappers are needed to extract information and organize it into a graph structure that makes explicit the concepts users want to query and update.
This paper presents a new approach to wrapper generation that exploits linguistic knowledge.
The approach produces a more fine-grained parse of sources with natural language text than previous efforts.
The resulting graph structured databases answer queries that could not be formulated in databases produced by prior generated wrappers.
In addition, our approach may be more robust in the face of slight variations in word choice and order.
We discuss a prototype implementation, lessons learned to date, evaluation issues, and future research directions.
Copyright © 1999 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 4, CIKM, DOLAP, GIS, SIGFIDET, ..." and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Cyrus Shahabi (Ed.):
ACM CIKM'99 2nd Workshop on Web Information and Data Management (WIDM'99), Kansas City, Missouri, USA, November 5-6, 1999.
ACM 1999
Contents BibTeX
Online Edition
Citation Page
BibTeX
References
- [1]
- Brad Adelberg:
NoDoSE - A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents.
SIGMOD Conference 1998: 283-294 BibTeX
- [2]
- Naveen Ashish, Craig A. Knoblock:
Wrapper Generation for Semi-structured Internet Sources.
SIGMOD Record 26(4): 8-15(1997) BibTeX
- [3]
- Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu:
A Query Language and Optimization Techniques for Unstructured Data.
SIGMOD Conference 1996: 505-516 BibTeX
- [4]
- ...
- [5]
- Mary F. Fernández, Daniela Florescu, Jaewoo Kang, Alon Y. Levy, Dan Suciu:
Catching the Boat with Strudel: Experiences with a Web-Site Management System.
SIGMOD Conference 1998: 414-425 BibTeX
- [6]
- ...
- [7]
- Ling Liu, Wei Han, David Buttler, Calton Pu, Wei Tang:
An XML-based Wrapper Generator for Web Information Extraction.
SIGMOD Conference 1999: 540-543 BibTeX
- [8]
- Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom:
Lore: A Database Management System for Semistructured Data.
SIGMOD Record 26(3): 54-66(1997) BibTeX
- [9]
- Yannis Papakonstantinou, Hector Garcia-Molina, Jennifer Widom:
Object Exchange Across Heterogeneous Information Sources.
ICDE 1995: 251-260 BibTeX
- [10]
- Dan Suciu:
Foreword: Management of Semistructured Data.
SIGMOD Record 26(4): 4-7(1997) BibTeX
- [11]
- ...
- [12]
- ...
- [13]
- ...
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
WIDM 1999 Proceedings, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:47:54 2009