An Effective Evaluation Measure for Clustering on Evolving Data Streams

Due to the ever growing presence of data streams, there has been a considerable amount of research on stream mining algorithms. While many algorithms have been introduced that tackle the problem of clustering on evolving data streams, hardly any attention has been paid to appropriate evaluation measures. Measures developed for static scenarios, namely structural measures and ground-truth-based measures, cannot correctly reflect errors attributable to emerging, splitting, or moving clusters. These situations are inherent to the streaming context due to the dynamic changes in the data distribution.

 

In this paper we develop a novel evaluation measure for stream clustering called Cluster Mapping Measure (CMM). CMM effectively indicates different types of errors by taking the important properties of evolving data streams into account. We show in extensive experiments on real and synthetic data that CMM is a robust measure for stream clustering evaluation.

Authors: Kremer H., Kranen P., Jansen T., Seidl T., Bifet A., Holmes G., Pfahringer B.
Published in: Proc. of the 17th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, CA, USA
Publisher: ACM - New York, NY, USA
Language: EN
Year: 2011
Pages: 868-876
ISBN: 978-1-4503-0813-7
Conference: KDD
DOI:10.1145/2020408.2020555
Url:KDD 2011
DOI: 10.1145/2020408.2020555
Type: Conference papers (peer reviewed)
Research topic: Data Analysis and Knowledge Extraction