2006 Digital Symposium Collection

Example-Based Robust Outlier Detection in High Dimensional Datasets

Cui Zhu, Hiroyuki Kitagawa, and Christos Faloutsos
View Paper (PDF)

Return to Session 18: Statistical Methods II

Abstract

Detecting outliers is an important problem. Most of its applications typically possess high dimensional datasets. In high dimensional space, the data becomes sparse which implies that every object can be regarded as an outlier from the point of view of similarity. Furthermore, a fundamental issue is that the notion of which objects are outliers typically varies between users, problem domains or, even, datasets. In this paper, we present a novel robust solution which detects high dimensional outliers based on user examples and tolerates incorrect inputs. It studies the behavior of projections of such a few examples, to discover further objects that are outstanding in the projection where many examples are outlying. Our experiments on both real and synthetic datasets demonstrate the ability of the proposed method to detect outliers corresponding to the user examples.