![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Overview:
optimization |
![]() |
||
[Back to Overview] [Query rewriting] [Combining IR] [Object View] [Combining IE]
Query rewriting
Combining with IRA "populating class" is extracted from the user's query. With the restricted pool of schema components available today in AKIRA system, the only "populating class" is the class Conference. To each "populating class" is associated a parameterized IR query.
A IR query access relevant Web services (search engine, database...) to retrieve information. Today, the only IR query attached to the class Conference is a query that retrieve Calls for Papers, since the information about conferences available in the conceptual representation (dates, location, PC members, URL, etc.) is extractable from Calls for Papers. Tomorrow, the system could be extended to other information about conferences, such as registration_fee, program, etc. extractable from Calls for Participation. In order to do so, the IR query associated to the class Conference should also retrieve Calls for Participation. The IR query sent to the Web will depend on the user's query the following way. The Dispatcher will have to find for each attribute of the target structure the right source of documents its associated IE tool is able to extract information from. For instance, the dates of the conference can be extracted from both the Call for Papers and Call for Participation (with the same IE tool) when the deadline for submissions can only be extracted from the Call for Papers, and the registration fee from the call for Participation. The evaluation of the query is a little bit more complicated since several sources of documents may be retrieved respectively processed by IE tools to extract information necessary to populate a single structured cache.
The IR component addresses several issues in optimization. First the
right service has to be chosen to retrieve information from the Web. Should
it be decided once? Should the system allow some flexibility and access
several competitive services and thus deal with redundant information with
different format (warehouse)? Then for each of these services the right
query has to be asked. What is the best combination of keywords to obtain
good recall and precision with this search engine? What are the relevant
parameters that should be added in order to better filter the retrieval
step according to the user's need?
Defining the Object View
Combining with IE