
























|
 |
|
Data Mining on an OLTP System (Nearly) for Free
|
 |
Erik Riedel,
Christos Faloutsos,
Gregory R. Ganger, and
David Nagle
View Paper (PDF)
Return to Research Sessions
 |
|
Abstract
|
 |
This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to consistently provide one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head "passes over" them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.
 |
|
References
|
 |
Note: References link to DBLP on the Web.
-
[Acharya98]
-
Anurag Acharya
,
Mustafa Uysal
,
Joel H. Saltz
: Active Disks: Programming Model, Algorithms and Evaluation.
ASPLOS 1998
: 81-91
-
[Agrawal96]
-
Rakesh Agrawal
,
John C. Shafer
: Parallel Mining of Association Rules.
TKDE 8(6)
: 962-969(1996)
-
[Brown92]
-
...
-
[Brown93]
-
Kurt P. Brown
,
Michael J. Carey
,
Miron Livny
: Managing Memory to Meet Multiclass Workload Response Time Goals.
VLDB 1993
: 328-341
-
[Chaudhuri97]
-
Surajit Chaudhuri
,
Umeshwar Dayal
: An Overview of Data Warehousing and OLAP Technology.
SIGMOD Record 26(1)
: 65-74(1997)
-
[Cirrus98]
-
...
-
[Denning67]
-
...
-
[Fayyad98]
-
...
-
[Ganger98]
-
...
-
[Gray97]
-
...
-
[Guha98]
-
Sudipto Guha
,
Rajeev Rastogi
,
Kyuseok Shim
: CURE: An Efficient Clustering Algorithm for Large Databases.
SIGMOD Conference 1998
: 73-84
-
[HP98]
-
...
-
[IBM99]
-
...
-
[Keeton98]
-
Kimberly Keeton
,
David A. Patterson
,
Joseph M. Hellerstein
: A Case for Intelligent Disks (IDISKs).
SIGMOD Record 27(3)
: 42-52(1998)
-
[Korn98]
-
Flip Korn
,
Alexandros Labrinidis
,
Yannis Kotidis
,
Christos Faloutsos
: Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining.
VLDB 1998
: 582-593
-
[Paulin97]
-
...
-
[Riedel98]
-
Erik Riedel
,
Garth A. Gibson
,
Christos Faloutsos
: Active Storage for Large-Scale Data Mining and Multimedia.
VLDB 1998
: 62-73
-
[Ruemmler94]
-
Chris Ruemmler
,
John Wilkes
: An Introduction to Disk Drive Modeling.
IEEE Computer 27(3)
: 17-28(1994)
-
[Seagate98]
-
...
-
[Siemens98]
-
...
-
[Veritas99]
-
...
-
[Widom95]
-
Jennifer Widom
: Research Problems in Data Warehousing.
CIKM 1995
: 25-30
-
[Worthington94]
-
Bruce L. Worthington
,
Gregory R. Ganger
,
Yale N. Patt
: Scheduling Algorithms for Modern Disk Drives.
SIGMETRICS 1994
: 241-252
-
[Worthington95]
-
Bruce L. Worthington
,
Gregory R. Ganger
,
Yale N. Patt
,
John Wilkes
: On-Line Extraction of SCSI Disk Drive Parameters.
SIGMETRICS 1995
: 146-156
-
[Zhang97]
-
Tian Zhang
,
Raghu Ramakrishnan
,
Miron Livny
: BIRCH: A New Data Clustering Algorithm and Its Applications.
Data Mining and Knowledge Discovery 1(2)
: 141-182(1997)
 |
|
BIBTEX
|
 |
@inproceedings{DBLP:conf/sigmod/RiedelFGN00,
author = {Erik Riedel and
Christos Faloutsos and
Gregory R. Ganger and
David Nagle},
editor = {Weidong Chen and
Jeffrey F. Naughton and
Philip A. Bernstein},
title = {Data Mining on an OLTP System (Nearly) for Free},
booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
Management of Data, May 16-18, 2000, Dallas, Texas, USA},
journal = {SIGMOD Record},
publisher = {ACM},
volume = {29},
number = {2},
year = {2000},
isbn = {1-58113-218-2},
pages = {13-21},
crossref = {DBLP:conf/sigmod/2000},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
DiSC'01 Copyright ©2002 ACM Inc.
|