Welcome to D
SIGMOD'00
 = SIGMOD'00 We
 = Plenary Talk
<<< = SIGMOD'00 Pa>>>
PODS'00
SIGMOD Recor
CIKM 2000/CI
COMAD 2000
Data Enginee
DL 2000
DPDJ
EDBT 2000
Hypertext 20
ICDE 2000
KDD 2000
KDD Explorat
KRDB 2000
SBBD 2000
SIGIR 2000
SIGIR Forum
SSDBM 2000
TODS
VLDB'00
VLDBJ

SQLEM: Fast Clustering in SQL using the EM Algorithm


Carlos Ordonez and Paul Cereghini



Return to Industrial Sessions


Abstract

Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an efficient SQL implementation of the EM algorithm to perform clustering in very large databases. Our version can effectively handle high dimensional data, a high number of clusters and more importantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect this work to be useful for data mining programmers and users who want to cluster large data sets inside a relational DBMS.


References


Note: References link to DBLP on the Web.

[1]
Charu C. Aggarwal , Cecilia Magdalena Procopiuc , Joel L. Wolf , Philip S. Yu , Jong Soo Park : Fast Algorithms for Projected Clustering. SIGMOD Conference 1999 : 61-72
[2]
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan : Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998 : 94-105
[3]
P. S. Bradley , Usama M. Fayyad , Cory Reina : Scaling Clustering Algorithms to Large Databases. KDD 1998 : 9-15
[4]
...
[5]
John Clear , Debbie Dunn , Brad Harvey , Michael L. Heytens , Peter Lohman , Abhay Mehta , Mark Melton , Lars Rohrberg , Ashok Savasere , Robert M. Wehrmeister , Melody Xu : NonStop SQL/MX Primitives for Knowledge Discovery. KDD 1999 : 425-429
[6]
...
[7]
Richard Dubes , Anil K. Jain : Clustering Methodologies in Exploratory Data Analysis. Advances in Computers 19 : 113-228(1980)
[8]
...
[9]
William DuMouchel , Chris Volinsky , Theodore Johnson , Corinna Cortes , Daryl Pregibon : Squashing Flat Files Flatter. KDD 1999 : 6-15
[10]
Martin Ester , Hans-Peter Kriegel , Jörg Sander , Xiaowei Xu : A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996 : 226-231
[11]
Alexander Hinneburg , Daniel A. Keim : Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. VLDB 1999 : 506-517
[12]
...
[13]
F. Murtagh : A Survey of Recent Advances in Hierarchical Clustering Algorithms. The Computer Journal 26(4) : 354-359(1983)
[14]
Raymond T. Ng , Jiawei Han : Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994 : 144-155
[15]
Carlos Ordonez , Edward Omiecinski : Discovering Association Rules Based on Image Content. ADL 1999 : 38-49
[16]
...
[17]
Tian Zhang , Raghu Ramakrishnan , Miron Livny : BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conf. 1996 : 103-114

BIBTEX


@inproceedings{DBLP:conf/sigmod/OrdonezC00,
  author    = {Carlos Ordonez and
                Paul Cereghini},
   editor    = {Weidong Chen and
                Jeffrey F. Naughton and
                Philip A. Bernstein},
   title     = {SQLEM: Fast Clustering in SQL using the EM Algorithm},
   booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
                Management of Data, May 16-18, 2000, Dallas, Texas, USA},
   journal   = {SIGMOD Record},
   publisher = {ACM},
   volume    = {29},
   number    = {2},
   year      = {2000},
   isbn      = {1-58113-218-2},
   pages     = {559-570},
   crossref  = {DBLP:conf/sigmod/2000},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },




DiSC'01 Copyright ©2002 ACM Inc.