Algorithms: For Mining Association Rules for Binary Segmentations of Huge Categorical Databases
Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda
Full Paper (PDF)

Abstract
We consider the problem of finding association rules that make nearly optimal binary segmentations of huge categorical databases. The optimality of segmentation is defined by an objective function suitable for the user's objective. An objective function is usually defined in terms of the distribution of agiven target attribute. Our goal is to find association rules that split databases into two subsets, optimizing the value of an objective function.

The problem is intractable for general objective functions, because letting N be the number of records of a given database, there are 2N possible binary segmentations, and we may have to exhaustively examine all of them. However, when the objective function is convex, there are feasible algorithms for finding nearly optimal binary segmentations, and we prove that typical criteria, such as "entropy (mutual information)," "x2 (correlation)," and "gini index (mean squared error)," are actually convex.

We propose practical algorithms that use computational geometry techniquesto handle cases where a target attribute is not binary, in which conventional approaches cannot be used directly.


References

References, where available, link to the DBLP on the World Wide Web.

[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499
[AT94]
...
[BEHW89]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, Manfred K. Warmuth: Learnability and the Vapnik-Chervonenkis Dimension. JACM 36(4): 929-965(1989)
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
[Cen92]
...
[DEY86]
David P. Dobkin, Herbert Edelsbrunner, Chee K. Yap: Probing Convex Polytopes. STOC 1986: 424-432
[FMMT96a]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155
[FMMT96b]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conf. 1996: 13-23
[FMMT96c]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191
[HF95]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431
[HII95]
Susumu Hasegawa, Hiroshi Imai, Masaki Ishiguro: epsilon-Approximations of k-Label Spaces. TCS 137(1): 145-157(1995)
[HW87]
...
[MFMT97]
...
[MP91]
...
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186
[PS85]
Franco P. Preparata, Michael Ian Shamos: Computational Geometry - An Introduction. Springer 1985, ISBN 3-540-96131-3
[PS91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases. 1991: 229-248
[Qui86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1: 81-106(1986)
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
Referenced By:
  1. Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. VLDB 1998: 416-427
BIBTEX

@inproceedings{DBLP:conf/vldb/MorimotoFMTY98,
author = {Yasuhiko Morimoto and
Takeshi Fukuda and
Hirofumi Matsuzawa and
Takeshi Tokuyama and
Kunikazu Yoda},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {Algorithms for Mining Association Rules for Binary Segmentations
of Huge Categorical Databases},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {380-391},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).