![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Research Sessions Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to "redo" the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software. Note: References link to DBLP on the Web.
@inproceedings{DBLP:conf/sigmod/LabioWGG00, author = {Wilburt Labio and Janet L. Wiener and Hector Garcia-Molina and Vlad Gorelik}, editor = {Weidong Chen and Jeffrey F. Naughton and Philip A. Bernstein}, title = {Efficient Resumption of Interrupted Warehouse Loads}, booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA}, journal = {SIGMOD Record}, publisher = {ACM}, volume = {29}, number = {2}, year = {2000}, isbn = {1-58113-218-2}, pages = {46-57}, crossref = {DBLP:conf/sigmod/2000}, bibsource = {DBLP, http://dblp.uni-trier.de} } }, DiSC'01 Copyright ©2002 ACM Inc. |