Mining Surprising Patterns Using Temporal Description Length
Soumen Chakrabarti, Sunita Sarawagi, Byron Dom
Full Paper (PDF)

Abstract
We propose a new notion of surprising temporal patterns in market basket data, and algorithms to find such patterns. This is distinct from finding frequent patterns as addressed in the common mining literature. We argue that once the analyst is already familiar with prevalent patterns in the data, the greatest incremental benefit is likely to be from changes in the relationship between item frequencies over time.

A simple measure of surprise is the extent of departure from a model, estimated using standard multivariate time series analysis. Unfortunately, such estimation involves models, smoothing windows and parameters whose optimal choices can vary dramatically from one application to another. In contrast, we propose a precise characterization of surprise based on the number of bits in which a basket sequence can be encoded under a carefully chosen coding scheme. In this scheme it is inexpensive to encode sequences of itemsets that have steady, hence likely to be well-known, correlation between items. Conversely, a sequence with large code length hints at a possibly surprising correlation.

Our notion of surprise also has the desirable property that the score of aset of items is offset by anything surprising that the user may already know from the marginal distribution of any proper subset. No parameters, such as support, confidence, or smoothing windows, need to be estimated or specified by the user.

We experimented with real-life market basket data. The algorithm successfully rejected a large number of frequent sets of items that bore obvious and steady complementary relations to each other, such as cereal and milk. Instead, our algorithm found itemsets that showed statistically strong fluctuations in correlation over time. These items had no obviously complementary roles.


References

References, where available, link to the DBLP on the World Wide Web.

[1]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. TKDE 5(6): 914-925(1993)
[2]
...
[3]
Rakesh Agrawal, Giuseppe Psaila: Active Data Mining. KDD 1995: 3-8
[4]
Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, Mohamed Zaït: Querying Shapes of Histories. VLDB 1995: 502-514
[5]
...
[6]
...
[7]
...
[8]
...
[9]
...
[10]
David Wai-Lok Cheung, Jiawei Han, Vincent Ng, C. Y. Wong: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996: 106-114
[11]
...
[12]
...
[13]
...
[14]
...
[15]
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407
[16]
...
[17]
Brian Lent, Rakesh Agrawal, Ramakrishnan Srikant: Discovering Trends in Text Databases. KDD 1997: 227-230
[18]
Banu Özden, Sridhar Ramaswamy, Abraham Silberschatz: Cyclic Association Rules. ICDE 1998: 412-421
[19]
...
[20]
...
[21]
Abraham Silberschatz, Alexander Tuzhilin: What Makes Patterns Interesting in Knowledge Discovery Systems. TKDE 8(6): 970-974(1996)
[22]
Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
[23]
...
[24]
...
BIBTEX

@inproceedings{DBLP:conf/vldb/ChakrabartiSD98,
author = {Soumen Chakrabarti and
Sunita Sarawagi and
Byron Dom},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {Mining Surprising Patterns Using Temporal Description Length},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {606-617},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


DBLP: Copyright ©1999 by Michael Ley (ley@uni-trier.de).