When querying a database, a user is restricted to a given and frozen
organization of information, enforced by the creator when designing the
schema. Should the user send a request beyond the schema, he will be denied
access to the expected information. The creator has imposed his view to
the user. This is the limitation of the source-driven approach.
Our user-oriented paradigm grants more flexibility by allowing
the user to design his own view on demand. There is no magic: the limits
are transfered to the extraction capabilities of the system. AKIRA's administrator
is in charge of representing these capabilities at a conceptual level in
terms of schema components, in a modular way. He provides the user
with a pool of schema components that
can be combined to specify the user's view.
Concepts, meta-concepts and attributes
We consider three extraction capabilities: extracting concepts,
meta-concepts and attributes which we represent in an abstract
way in a pool of schema components. Extracting a concept
goes beyond the usual retrieval of keywords, even though the extracting
technique may use a list of keywords. We suppose that a concept
is extracted when the following steps are accomplished: (1) recognition,
and (2) identification (canonical representation). The recognition phase
consists in selecting fragments of documents which satisfy a certain criteria.
The retrieved fragments are identified in a second step. An IE tool capable
of recognizing and identifying conference names is represented by a concept
class Conference with attribute name.
Similarly, a date extraction tool corresponds to a class Date
with attributes month, day
and
year. Each of these classes is a schema
component by itself. A concept class can be specialized according to other
extraction capabilities. For example, attribute topic,
listing topics of interest, can specialize class Conference.
Two concept classes can be also combined through a
meta-concept
such as submission_deadline to assemble a
new conceptual schema.