2003 Digital Symposium Collection

Implementing and Evaluating Warehouses and Summaries over a Cluster

Pedro Furtado
View Paper (PDF)

Return to Data Warehousing Techniques and Applications

Abstract

Cluster computation power provides a promising way to improve response time in large data warehouses On the other hand, the use of sampling summaries on the cluster for approximate answering of OLAP queries provides a very flexible system that can provide response time guarantees. In this paper we explore the cluster computation paradigm for data warehouses and summaries. The use of cluster computation in a network with N computers can speedup query processing about N times and further speedup can be obtained using samples instead of the full data. Sampling summaries have been proposed before in the context of OLAP queries to avoid query processing times that leave users and applications waiting too long when only exploration analysis is required over more or less aggregated data. But while a typical one-node sampling summary is either too small to answer more detailed queries or too slow to provide almost instant response time, summaries over a cluster are extremely fast and are sufficiently large to answer most aggregation query patterns. We explore the implementation and processing of the data warehouse and sampling summaries over a set of nodes for cooperative cluster computing and present experimental results on the subject.