2001 Digital Symposium Collection

Efficient Resumption of Interrupted Warehouse Loads

Wilburt Labio, Janet L. Wiener, Hector Garcia-Molina, and Vlad Gorelik
View Paper (PDF)

Return to Research Sessions

Abstract

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to "redo" the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.

References

Note: References link to DBLP on the Web.

[1]: Philip A. Bernstein , Meichun Hsu , Bruce Mann : Implementing Recoverable Requests Using Queues. SIGMOD Conference 1990 : 112-122
[2]: Philip A. Bernstein , Eric Newcomer: Principles of Transaction Processing for Systems Professionals. Morgan Kaufmann 1996, ISBN 1-55860-415-4
[3]: ...
[4]: Transaction Processing Performance Council. http://www.tpc.org
[5]: Jim Gray , Andreas Reuter : Transaction Processing: Concepts and Techniques. Morgan Kaufmann 1993, ISBN 1-55860-190-2
Contents
[6]: ...
[7]: ...
[8]: C. Mohan , Inderpal Narang : Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates. SIGMOD Conference 1992 : 361-370
[9]: ...
[10]: ...
[11]: Janet L. Wiener , Jeffrey F. Naughton : OODB Bulk Loading Revisited: The Partitioned-List Approach. VLDB 1995 : 30-41
[12]: Andrew Witkowski , Felipe Cariño , Pekka Kostamaa : NCR 3700 - The Next-Generation Industrial Database Computer. VLDB 1993 : 230-243

BIBTEX

@inproceedings{DBLP:conf/sigmod/LabioWGG00,
  author    = {Wilburt Labio and
                Janet L. Wiener and
                Hector Garcia-Molina and
                Vlad Gorelik},
   editor    = {Weidong Chen and
                Jeffrey F. Naughton and
                Philip A. Bernstein},
   title     = {Efficient Resumption of Interrupted Warehouse Loads},
   booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
                Management of Data, May 16-18, 2000, Dallas, Texas, USA},
   journal   = {SIGMOD Record},
   publisher = {ACM},
   volume    = {29},
   number    = {2},
   year      = {2000},
   isbn      = {1-58113-218-2},
   pages     = {46-57},
   crossref  = {DBLP:conf/sigmod/2000},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },