Welcome to D
SIGMOD'00
PODS'00
SIGMOD Recor
CIKM 2000/CI
COMAD 2000
Data Enginee
DL 2000
DPDJ
EDBT 2000
Hypertext 20
ICDE 2000
<<< = ICDE'00 Pape>>>
KDD 2000
KDD Explorat
KRDB 2000
SBBD 2000
SIGIR 2000
SIGIR Forum
SSDBM 2000
TODS
VLDB'00
VLDBJ

CMP: A Fast Decision Tree Classifier Using Multivariate Predictions


H. Wang and C. Zaniolo

  View Paper (PDF)  

Return to New Trends in Data Mining


Abstract


Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a technique where, by keeping histograms on attribute pairs, we achieve (i) a significant speed-up over traditional classifiers based on single attribute splitting, and (ii) the ability of building classifiers that use linear combinations of values from non-categorical attribute pairs as split criterion. Indeed, by keeping two-dimensional histograms, CMP can often predict the best successive split, in addition to computing the current one; therefore, CMP is normally able to grow more than one level of a decision tree for each data scan. CMP's performance improvements are also due to techniques whereby non-categorical attributes are discretized without loss in classification accuracy; in fact, we introduce simple techniques, whereby classification errors caused by discretization at one step can then be corrected in the following step. In summary, CMP represents a unified algorithm that extends the functionality of existing classifiers and improves their performance.



DiSC'01 Copyright ©2002 ACM Inc.