Welcome to DiSC 2002
SIGMOD 2001
PODS 2001
 = PODS'01 Website
 = SIGMOD/PODS'01 Plena
<<< = PODS'01 Papers>>>
 = Invited Talks
 = Award Talks
 = Tutorials
 SIGMOD RECORD 2001
CIKM 2001
CoopIS 2001
DASFAA 2001
DASFAA 2000
DBPL 2001
Data Engineering Bul
DEXA_EC-WEB 2001
DMKD 2001
 DPDJ 2001
HYPERTEXT 2001
ICDE 2001
ICDM 2001
ICDT 2001
JCDL 2001
KDD 2001
 KDD_EXPLORATIONS 20
KRDB 2001
MDM 2001
MIR 2001
MIS 2001
RIDE 2001
SBBD 2001
 SIGIR 2001
 SIGIR FORUM 2001
SSDBM 2001
SSTD 2001
TODS 2001
TIME 2001
VLDB 2001
VLDBJ 2001

Optimal Aggregation Algorithms for Middleware


Ronald Fagin, Amnon Lotem, and Moni Naor

  View Paper (PDF)  

Return to Award Talks


Abstract

Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, or combining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine objects that have the best overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (Fagin's Algorithm", or FA) that is much more efficient. For some distributions on grades, and for some monotone aggregation functions, FA is optimal in a high- probability sense. We analyze an elegant and remarkably simple algorithm (the threshold algorithm", or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially op timal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer. We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well.


DiSC'02 © 2003 Association for Computing Machinery