![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Integration and Mining Computing frequent itemsets and maximally frequent itemsets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking. @inproceedings {DBLP:conf/pods/RameshMZ03, author = {Ganesh Ramesh and William Maniatty and Mohammed Javeed Zaki}, booktitle = {PODS}, title = {Feasible itemset distributions in data mining: theory and application.}, pages = {284-295}, year = {2003}, url = {db/conf/pods/pods2003.html#RameshMZ03}, ee = {http://doi.acm.org/10.1145/773153.773181}, crossref = {conf/pods/2003}, bibsource = {DBLP, http://dblp.uni-trier.de} } ![]() ©2004 Association for Computing Machinery |