Effective Evaluation Measures for Subspace Clustering of Data Streams

Nowadays, most streaming data sources are becoming high-dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other hand, available stream clustering evaluation measures care only about the errors of the full-space clustering but not the quality of subspace clustering.  

In this paper we propose, to the first of our knowledge, the first subspace clustering measure that is designed for streaming data, called SubCMM: Subspace Cluster Mapping Measure. SubCMM is an effective evaluation measure for stream subspace clustering that is able to handle errors caused by emerging, moving, or splitting subspace clusters. Additionally, we propose a novel method for using available offline subspace clustering measures for data streams within the Subspace MOA framework.

Authors: Hassani M., Kim Y., Seungjin Choi, Seidl T.
Published in: The third Quality issues, measures of interestingness and evaluation of data mining models workshop (QIMIE'13), held in conjunction with PAKDD'13 conference (17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia)
Publisher: Springer
Sprache: EN
Jahr: 2013
Seiten: 342-353
ISBN: 978-3-642-40319-4
ISSN: 0302-9743
Konferenz: PAKDD
Typ: Tagungsbeiträge
Forschungsgebiet: Data Analysis and Knowledge Extraction