2003 Digital Symposium Collection

COMBI-Operator: Database Support for Data Mining Applications

Alexander Hinneburg, Dirk Habich, and Wolfgang Lehner
View Paper (PDF)

Return to Data Mining (Session B2)

Abstract

Database support for data mining has be- come an important research topic. Espe- cially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this paper we iden- tify the data intensive subproblem of aggre- gating high-dimensional data in all possible low-dimensional projections (for instance es- timating low-dimensional histograms), which occurs in several established data mining techniques. Second, we show that exist- ing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fts into the set of existing OLAP Group By operators. Third, we propose efficient implementations for the operator, which take the limited re- sources of main memory into account. We demonstrate on a number of real and synthetic data sets that for the identified subprob- lem our new implementations yield a large speedup (up to factor 10) over existing meth- ods built in commercially available database systems.