Exploratory Mining and Pruning Optimizations of Constrained Association Rules
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang
Full Paper (PDF)

Abstract
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcomings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraint-based, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of constraint constructs, including domain, class, and sql-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose "constrained association queries" as a means of specifying the constraints to be satisfied by the antecedent and consequent of a mined association.

In this paper, we mainly focus on the technical challenges in guaranteeing a level of performance that is commensurate with the selectivities of the constraints in an association query. To this end, we introduce and analyze two properties of constraints that are critical to pruning: "anti-monotonicity" and "succinctness". We then develop characterizations of various constraints into four categories, according to these properties. Finally, we describe a mining algorithm called CAP, which achieves a maximized degree of pruning for all categories of constraints. Experimental results indicate that CAP can run much faster, in some cases as much as 80 times, than several basic algorithms. This demonstrates how important the succinctness and anti-monotonicity properties are, in delivering the performance guarantee.

References

References, where available, link to the DBLP on the World Wide Web.

[1]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216
[2]
Rakesh Agrawal, John C. Shafer: Parallel Mining of Association Rules. TKDE 8(6): 962-969(1996)
[3]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499
[4]
Elena Baralis, Giuseppe Psaila: Designing Templates for Mining Association Rules. JIIS 9(1): 7-32(1997)
[5]
Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
[6]
David Wai-Lok Cheung, Jiawei Han, Vincent Ng, C. Y. Wong: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996: 106-114
[7]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conf. 1996: 13-23
[8]
Eui-Hong Han, George Karypis, Vipin Kumar: Scalable Parallel Data Mining for Association Rules. SIGMOD Conference 1997: 277-288
[9]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431
[10]
Joseph M. Hellerstein, Peter J. Haas, Helen Wang: Online Aggregation. SIGMOD Conference 1997: 171-182
[11]
Tomasz Imielinski, Heikki Mannila: A Database Perspective on Knowledge Discovery. CACM 39(11): 58-64(1996)
[12]
...
[13]
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407
[14]
Brian Lent, Arun N. Swami, Jennifer Widom: Clustering Association Rules. ICDE 1997: 220-231
[15]
Rosa Meo, Giuseppe Psaila, Stefano Ceri: A New SQL-like Operator for Mining Association Rules. VLDB 1996: 122-133
[16]
R. J. Miller, Yuping Yang: Association Rules over Interval Data. SIGMOD Conference 1997: 452-461
[17]
...
[18]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186
[19]
Ashoka Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444
[20]
Abraham Silberschatz, Stanley B. Zdonik: Database Systems - Breaking Out of the Box. SIGMOD Record 26(3): 36-50(1997)
[21]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419
[22]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conf. 1996: 1-12
[23]
...
[24]
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145
[25]
...
BIBTEX

@inproceedings{DBLP:conf/sigmod/NgLHP98,
author = {Raymond T. Ng and
Laks V. S. Lakshmanan and
Jiawei Han and
Alex Pang},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Exploratory Mining and Pruning Optimizations of Constrained Association
Rules},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-955-5},
pages = {13-24},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).