2001 Digital Symposium Collection

Mining Frequent Patterns without Candidate Generation

Jiawei Han, Jian Pei, and Yiwen Yin
View Paper (PDF)

Return to Research Sessions

Abstract

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.

In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

References

Note: References link to DBLP on the Web.

[1]: ...
[2]: ...
[3]: Rakesh Agrawal , Ramakrishnan Srikant : Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 : 487-499
[4]: Rakesh Agrawal , Ramakrishnan Srikant : Mining Sequential Patterns. ICDE 1995 : 3-14
[5]: Roberto J. Bayardo Jr. : Efficiently Mining Long Patterns from Databases. SIGMOD Conference 1998 : 85-93
[6]: Sergey Brin , Rajeev Motwani , Craig Silverstein : Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997 : 265-276
[7]: Guozhu Dong , Jinyan Li : Efficient Mining of Emerging Patterns: Discovering Trends and Differences. KDD 1999 : 43-52
[8]: Gösta Grahne , Laks V. S. Lakshmanan , Xiaohong Wang : Efficient Mining of Constrained Correlated Sets. ICDE 2000 : 512-521
[9]: Jiawei Han , Guozhu Dong , Yiwen Yin : Efficient Mining of Partial Periodic Patterns in Time Series Database. ICDE 1999 : 106-115
[10]: ...
[11]: Micheline Kamber , Jiawei Han , Jenny Chiang : Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes. KDD 1997 : 207-210
[12]: Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo : Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994 : 401-407
[13]: Brian Lent , Arun N. Swami , Jennifer Widom : Clustering Association Rules. ICDE 1997 : 220-231
[14]: Heikki Mannila , Hannu Toivonen , A. Inkeri Verkamo : Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery 1(3) : 259-289(1997)
[15]: Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang : Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998 : 13-24
[16]: Jong Soo Park , Ming-Syan Chen , Philip S. Yu : An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995 : 175-186
[17]: Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal : Integrating Mining with Relational Database Systems: Alternatives and Implications. SIGMOD Conference 1998 : 343-354
[18]: Ashoka Savasere , Edward Omiecinski , Shamkant B. Navathe : An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995 : 432-444
[19]: Craig Silverstein , Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman : Scalable Techniques for Mining Causal Structures. VLDB 1998 : 594-605
[20]: Ramakrishnan Srikant , Quoc Vu , Rakesh Agrawal : Mining Association Rules with Item Constraints. KDD 1997 : 67-73

BIBTEX

@inproceedings{DBLP:conf/sigmod/HanPY00,
  author    = {Jiawei Han and
                Jian Pei and
                Yiwen Yin},
   editor    = {Weidong Chen and
                Jeffrey F. Naughton and
                Philip A. Bernstein},
   title     = {Mining Frequent Patterns without Candidate Generation},
   booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
                Management of Data, May 16-18, 2000, Dallas, Texas, USA},
   journal   = {SIGMOD Record},
   publisher = {ACM},
   volume    = {29},
   number    = {2},
   year      = {2000},
   isbn      = {1-58113-218-2},
   pages     = {1-12},
   crossref  = {DBLP:conf/sigmod/2000},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },