![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Range Queries / Selectivity Estimation Range queries are an important class of queries for several applications including relational and spatial databases, visualization, and GIS applications. For large datasets, the performance of range queries is limited by disk I/O. Performance improvements are achieved by tiling the multi-dimensional data and distributing it among multiple disks or nodes. Consequently, in order to process a range query, it is necessary to access only those tiles or blocks that intersect with the query. Given k disks, a query that accesses m blocks needs a number of parallel block accesses that is at least dm=ke (which is known to be unachievable except for a few special cases [1]). Though several schemes for the allocation of tiles to disks have been developed, no scheme with guaranteed worst-case performance is known. We establish that any range query on a 2q x 2q-block grid of blocks can be performed using k = 2t disks (t less than or equal to q), in at most dm=ke+O(log k) parallel block accesses. We also give two natural generalizations of the scheme to higher dimensions. We achieve this result by judiciously distributing the blocks among the k nodes or disks. Experimental data show that the algorithm achieves very close to dm=ke performance (on average less than 0.5 away from dm=ke, with a worst-case of 3). Although several declustering schemes for range queries have been developed, prior to our work no additive non-trivial performance bounds were known. Our scheme guarantees performance within a (small) additive deviation from dm=ke. This guarantee is true for any number of dimensions. Subsequent to this work, Bhatiaet al. [4] have proved that such a performance bound is essentially optimal for this kind of scheme, and have also extended our results to the case where the number of disks is a product of the form k1*k2*...*kt where the kis need not be 2. Note: References link to DBLP on the Web.
Referenced by
@inproceedings{DBLP:conf/pods/AtallahP00, author = {Mikhail J. Atallah and Sunil Prabhakar}, title = {(Almost) Optimal Parallel Block Access for Range Queries}, booktitle = {Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, May 15-17, 2000, Dallas, Texas, USA}, publisher = {ACM}, year = {2000}, isbn = {1-58113-214-X}, pages = {205-215}, crossref = {DBLP:conf/pods/00}, bibsource = {DBLP, http://dblp.uni-trier.de} } }, DiSC'01 Copyright ©2002 ACM Inc. |