2003 Digital Symposium Collection

A Nearest-Neighbor Method as a Data Preparation Tool for a Clustering Genetic Algorithm

Eduardo R. Hruschka, Estevam R. Hruschka Jr., and Nelson F. F. Ebecken

Return to Data Mining

Abstract

This paper presents a Nearest-Neighbor Method to substitute missing values in continuous datasets and show that it can be useful for a Clustering Genetic Algorithm. The proposed method is evaluated by means of simulations performed in the Wisconsin Breast Cancer Dataset, which is a benchmark for data mining methods. In this sense, we verify the efficacy of the proposed method in the context of a Clustering Genetic Algorithm, comparing the average classification rates obtained in the original dataset with those obtained in a dataset formed by the substituted values. The simulation results show that the proposed method is promising.