![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Data Mining (Session B2) Database support for data mining has be- come an important research topic. Espe- cially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this paper we iden- tify the data intensive subproblem of aggre- gating high-dimensional data in all possible low-dimensional projections (for instance es- timating low-dimensional histograms), which occurs in several established data mining techniques. Second, we show that exist- ing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fts into the set of existing OLAP Group By operators. Third, we propose efficient implementations for the operator, which take the limited re- sources of main memory into account. We demonstrate on a number of real and synthetic data sets that for the identified subprob- lem our new implementations yield a large speedup (up to factor 10) over existing meth- ods built in commercially available database systems. ![]() ©2004 Association for Computing Machinery |