Contributed
articles
Discovering matrix association from
biological databases
G. B.
Singh
(available in PDF )
ABSTRACT: Biological databases have continued their exponential growth over the last
decade, and data mining holds considerable promise for knowledge discovery in
these databases. Discovery of the elements of locus control from genetic
sequences is a significant problem as these elements are responsible for gene
expression and viability of an organism. Matrix Attachment Regions or MARs are
one such type of elements where the detection has been hampered due to the
limited knowledge about their structure. A discovery approach utilizing statistical estimation of "interestingness" has
been implemented in MARWiz software described in this paper. The strategy described is of general
applicability for detecting other classes of signals in time-series or DNA
sequence data
A note on "Beyond Market
Baskets: Generalizing association rules to
correlations"
K.M.Ahmed,
N.M.El-Makky, Y.Taha
(available in PDF )
ABSTRACT: In their paper \cite{dm1}, S. Brin, R. Motwani and C. Silverstien discussed measuring significance of (generalized) association rules via the support and the chi-squared test for correlation. They provided some illustrative examples and pointed that the chi-squared test needs to be augmented by a measure of interest that they also suggested.
This paper presents a further elaboration and extension of their discussion. As suggested by Brin et al, the chi-squared test succeeds in measuring the cell dependencies in a 2x2 contingency table. However, it can be misleading in cases of bigger contingency tables. We will give some illustrative examples based on those presented in \cite{dm1}. We will also propose a more appropriate reliability measure of association rules
Reports
from the KDD-99 Conference
KDD-99: The Fifth ACM SIGKDD
International Conference on Knowledge Discovery and Data
Mining
S.
Chaudhuri, D. Madigan, U. Fayyad
(available in PDF )
ABSTRACT: KDD-99 was the fifth conference in the KDD series attracting over 200 high quality submissions and almost 600 attendees. Here we describe some of the highlights of the technical program.
Keywords: KDD Conference overview, ACM SIGKDD.
Data Snooping, Dredging and
Fishing: The Dark Side of Data Mining
D.
Jensen
(available in PDF )
ABSTRACT: This article briefly describes a panel discussion at SIGKDD99.
Keywords:Overfitting, SIGKDD99, Panels
Integrating Data Mining into
Vertical Solutions
R.
Kohavi, M. Sahami
(available in PDF )
ABSTRACT: At KDD-99, the panel on Integrating Data Mining into Vertical Solutions addressed a series of questions regarding future trends in industrial applications. Panelists were chosen to represent different viewpoints from a variety of industry segments, including data providers (Jim Bozik), horizontal and vertical tool providers (Ken Ono and Steve Belcher respectively), and data mining consultants (Rob Gerritsen and Dorian Pyle). Questions presented to the panelists included whether data mining companies should sell solutions or tools, who are the users of data mining, will data mining functionality be integrated into databases, do models need to be interpretable, what is the future of horizontal and vertical tool providers, and will industry-standard APIs be adopted?
Knowledge Discovery in
Databases: Ten years after
G. Piatetsky-Shapiro
(available in PDF )
ABSTRACT: In this paper, we describe the past 10 years of KDD and outline predictions for the next 10 years.
Keywords:Knowledge Discovery in Databases, Data Mining, KDD, History.
Knowledge Discovery in
Databases: A discussion on the last 10 and next 10 years
R. Quinlan
(available in PDF )
ABSTRACT: This paper presents the authors impressions at the
panel with the above title held at KDD-99.
KDD-99 Classifier learning
contest
Overview: C. Elkan
(available in PDF )
ABSTRACT: This paper presents a summary of the results of
the classifier learning track of the KDD cup competition held at
KDD-99.
First winner report: B.
Pfahringer
(available in PDF)
ABSTRACT: The first place winners of the classifier
learning contest describe their method in this report.
Second winner report: I. Levin
(available in PDF)
ABSTRACT: Kernel Miner is a new data-mining tool based on building the optimal decision forest. The tool won second place in the KDD'99 Classifier Learning Contest, August 1999. We describe the Kernel Miner's approach and method used for solving the contest task. The received results are analyzed and explained.
Keywords: Data Mining competition, decision trees, optimal decision forest, classification, prediction.
Third winner report: M.
Vladimir, V.
Alexei, S. Ivan
(available in PDF)
ABSTRACT: The MP13 method is best summarized as recognition based on voting decision trees using "pipes" in potential space.
Keywords: Voting; Decision Tree; Potential Space
KDD-99 Knowledge discovery
contest
Overview: C. Elkan
(available in PDF)
ABSTRACT: This paper presents a summary of the results of the
knowledge discovery track of the KDD cup competition held at KDD-99.
Co-winner 1: J.
Georges, A. H. Milley
(available in PDF)
ABSTRACT: In this paper, we expand on the 1998 KDD cup competition findings: exploratory data analysis reveals unusual data anomalies; a two-stage prediction model yields superior results to those obtained in the 1998 competition; we use a decision tree to better understand the model (the decision boundary); and we apply a confidence interval to establish a range upon which we can reasonably judge model performance.
Keywords: Two-stage prediction, neural network, decision tree, model performance.
Co-winner 2: S. Rosset
and A. Inger
(available in PDF)
ABSTRACT: This report describes the results of our knowledge discovery and modeling on the data of the 1997 donation campaign of an American charitable organization.
Honorary mention: P. Sebastiani, M. Ramoni,
and A. Crea
(available in PDF)
ABSTRACT: This report describes a complete Knowledge Discovery session using Bayeswar e Discoverer, a program for the induction of Bayesian networks from incomplete data. We build tw o causal models to help an American Charitable Organization understand the characteristics of respo ndents to direct mail fund raising campaigns. The first model is a Bayesian network induced from the database of 96,376 Lapsed donors to the June '97 renewal mailing. The network describes the dependency of the probability of response to the renewal mail on a subset of the variables in the database. The second model is a Bayesian network representing the dependency of the dollar amo unt of the gift on the variables in the same reduced database. This model is induced from the 5\% o f cases in the database corresponding to the respondents to the renewal campaign. The two model s are used for both predicting the expected gift of a donor and understanding the characteristi cs of donors. These two uses can help the charitable organization to maximize the profit.
Keywords: Bayesian Networks, Customer Profiling, Missing Data
Other conference reports
Interface
99: A Data Mining Overview
A.
Goodman
(available in PDF)
Discovering geographic knowledge
in data rich environments: a report on a specialist
meeting
H.J.
Miller and J. Han
(available in PDF)
ABSTRACT: On 18-20 March 1999, a Specialist Meeting on Discovering geographic knowledge in data-rich environments was convened under the auspices of the Varenius Project of the National Center for Geographic Information and Analysis (NCGIA). This workshop brought together a diverse group of researchers and practitioners with interests in developing and applying new techniques for exploring large and diverse geographic datasets. The interaction prior to, during and after the three-day workshop resulted in the identification of research priorities and directions for continued development of geographic knowledge discovery (GKD) theory and techniques.
Keywords: Geographic data mining, spatio-temporal data mining, geographic information systems, geographic research.
WebKDD-99: Workshop on Web
Usage Analysis and User Profiling
Brij Masand, Dr. Myra Spiliopoulou
(available in PDF)
ABSTRACT: The WEBKDD'99 workshop on \Web Usage Analysis and User Pro,ling" took place at Aug. 15, 1999 under the auspices of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99). We report on the topics addressed in the workshop, the contributions and the discussions that took place in its framework.
Keywords: Web usage mining
KDD-99 Workshop on Large-Scale
Parallel KDD systems
M.
Zaki, C.T. Ho
(available in PDF)
SIGMOD 99 Workshop on research
issues in data mining and knowledge discovery
K. Shim, R. Srikant
(available in PDF)
Interesting KDD news from SIGMOD 99
D.
Keim
(available in PDF)
Book Reviews
Data Mining Methods for Knowledge Discovery
by K. Cios, W.
Pedrycz and R. Swiniarski, Kluwer
(available in PDF)
ABSTRACT: This paper is a review of the book Data Mining Methods for Knowledge Discovery", by K. Cios, W. Pedrycz and R. Swiniarski, Kluwer 1998, 495 pp.
Keywords: Data mining, Book review.
News, Events and Announcements
(available in PDF )
|