2003 Digital Symposium Collection

CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries

Sathish Govindarajan, Pankaj K. Agarwal, and Lars Arge
View Paper (PDF)

Return to Research Papers

Abstract

We propose a new indexing scheme, called the CRB-tree, for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in $\mathbb {R}^d$, compute the aggregate of weights of points that lie inside a $d$-dimensional query rectangle. In this paper we focus on range-COUNT, SUM, AVG aggregates. First, we develop an indexing scheme for answering two-dimensional range-COUNT queries that uses $O(N/B)$ disk blocks and answers a query in $O(\log_B N)$ I/Os, where $N$ is the number of input points and $B$ is the disk block size. This is the first optimal index structure for the 2D range-COUNT problem. The index can be extended to obtain a near-linear-size structure for answering range-SUM queries using $O(\log_B N)$ I/Os. We also obtain similar bounds for rectangle-intersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporal-aggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdB-tree. For a dataset of around 100 million points, the CRB-tree query time is 8-10 times faster than the kdB-tree query time. Furthermore, unlike other indexing schemes, the query performance of CRB-tree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle.