Subspace Clustering for Uncertain Data

Analyzing uncertain databases is a challenge in data mining research. Usually, data mining methods rely on databases where precise values are given. In scenarios where uncertain values occur, e.g. noisy sensor readings, these algorithms cannot deliver patterns of high quality. Additionally, in modern applications high dimensional data is processed. One important data mining method for high dimensional data is subspace clustering, where object groupings with locally relevant dimensions are found. Deciding whether dimensions are relevant for a cluster is even more challenging for uncertain data. Approaches for subspace clustering on uncertain databases are needed. In this paper, we develop a method for subspace clustering for uncertain data that delivers high-quality patterns; the information provided by the individual distributions of objects is used in an effective manner. Because in uncertain scenarios a strict assignment of objects to single clusters is not appropriate, we enrich our model with the concept of membership degree. Due to the complexity of the approach, we propose an efficient algorithm. In thorough experiments we show the effectiveness and efficiency of our new subspace clustering model.

Authors: Günnemann S., Kremer H., Seidl T.
Published in: Proc.  SIAM International Conference on Data Mining (SDM 2010), Columbus, Ohio, USA.
Publisher: SIAM
Language: EN
Year: 2010
Pages: 385-396
Conference: SDM
DOI:http://www.siam.org/proceedings/datamining/2010/dm10_034_gunnemanns.pdf
Url:SDM 2010
Type: Conference papers (peer reviewed)
Research topic: Data Analysis and Knowledge Extraction