Outlier detection and ranking based on subspace clustering

Detecting outliers is an important task for many applications including fraud detection or consistency validation in real world data. Particularly in the presence of uncertain or imprecise data, similar objects regularly deviate in their attribute values. The notion of outliers has thus to be defined carefully. Considering outlier detection as a task being complementary to clustering, binary decisions whether an object is regarded to be an outlier or not are near at hand. For high-dimensional data, however, objects may belong to different clusters in different subspaces. More fine-grained concepts to define outliers are therefore demanded. By our new OutRank approach, we address outlier detection in heterogeneous high dimensional data and propose a novel scoring function that provides a consistent model for ranking outliers in the presence of different attribute types. Preliminary experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets.

Authors: Seidl T., Müller E., Assent I., Steinhausen U.
Published in: Dagstuhl Seminar 08421 on Uncertainty Management in Information Systems.
Sprache: EN
Jahr: 2008
ISSN: 1862-4405
URL:Dagstuhl seminar 08421
paper at DROPS
Typ: Sonstige
Forschungsgebiet: Data Analysis and Knowledge Extraction