Outlier Ranking for High Dimensional Data

Research topic: Data Analysis and Knowledge Extraction

Detecting outliers is an important task for many applications including fraud detection or consistency validation in real world data. Particularly in the presence of uncertain or imprecise data, similar objects regularly deviate in their attribute values. Thus, the notion of outliers has to be defined carefully. When considering outlier detection as a task which is complementary to clustering, binary decisions whether an object is regarded to be an outlier or not seem to be near at hand. However, for high dimensional data objects may belong to different clusters in different subspaces. More fine-grained concepts to define outliers are therefore demanded. By our new outlier ranking approaches, we address outlier detection in subspaces of high dimensional data. We propose novel scoring functions that provide consistent models for ranking outliers in the presence of object deviation in arbitrary subspace projections.