 |


















|
|
Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy | Full Paper (PDF)
Association rule mining recently attracted strong attention. Usually, the classification hierarchy over the data items is available. Users are interested in generalized association rules that span different levels of the hierarchy, since sometimes more interesting rules can be derived by taking the hierarchy into account.
In this paper, we propose the new parallel algorithms for mining association rules with classification hierarchy on a shared-nothing parallel machine to improve its performance. Our algorithms partition the candidate itemsets over the processors, which exploits the aggregate memory of the system effectively. If the candidate itemsets are partitioned without considering classification hierarchy, both the items and its all the ancestor items have to be transmitted, that causes prohibitively large amount of communications. Our method minimizes interprocessor communication by considering the hierarchy. Moreover, in our algorithm, the available memory space is fully utilized by identifying the frequently occurring candidate itemsets and copying them over all the processors, through which frequent itemsets can be processed locally without any communication. Thus it can effectively reduce the load skew among the processors. Several experiments are done by changing the granule of copying itemsets, from the whole tree, to the small group of the frequent itemsets along the hierarchy. The coarser the grain, the easier the control but it is rather difficult to achieve the sufficient load balance. The finer the grain, the more complicated the control is required but it can balance the load quite well.
We implemented proposed algorithms on IBM SP-2. Performance evaluations show that our algorithms are effective for handling skew and attain sufficient speedup ratio. |
References, where available, link to the DBLP on the World Wide Web.
[AS96]Rakesh Agrawal, John C. Shafer:
Parallel Mining of Association Rules.
TKDE 8(6): 962-969(1996)[CHN+96]David Wai-Lok Cheung, Jiawei Han, Vincent Ng, Ada Wai-Chee Fu, Yongjian Fu:
A Fast Distributed Algorithm for Mining Association Rules.
PDIS 1996: 31-42[CNFF96]David Wai-Lok Cheung, Vincent Ng, Ada Wai-Chee Fu, Yongjian Fu:
Efficient Mining of Association Rules in Distributed Databases.
TKDE 8(6): 911-922(1996)[HKK97]Eui-Hong Han, George Karypis, Vipin Kumar:
Scalable Parallel Data Mining for Association Rules.
SIGMOD Conference 1997: 277-288[PCY95]Jong Soo Park, Ming-Syan Chen, Philip S. Yu:
Efficient Parallel and Data Mining for Association Rules.
CIKM 1995: 31-36[RR94]Rakesh Agrawal, Ramakrishnan Srikant:
Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994: 487-499[SA95]Ramakrishnan Srikant, Rakesh Agrawal:
Mining Generalized Association Rules.
VLDB 1995: 407-419[SA96]Ramakrishnan Srikant, Rakesh Agrawal:
Mining Sequential Patterns: Generalizations and Performance Improvements.
EDBT 1996: 3-17[SK96]Takahiko Shintani, Masaru Kitsuregawa:
Hash Based Parallel Algorithms for Mining Association Rules.
PDIS 1996: 19-30[SK98]...
|
@inproceedings{DBLP:conf/sigmod/ShintaniK98, author = {Takahiko Shintani and Masaru Kitsuregawa}, editor = {Laura M. Haas and Ashutosh Tiwary}, title = {Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy}, booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA}, publisher = {ACM Press}, year = {1998}, isbn = {0-89791-955-5}, pages = {25-36}, crossref = {DBLP:conf/sigmod/98}, bibsource = {DBLP, http://dblp.uni-trier.de} }
|
DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).
|
|