Benchmarking Stream Clustering Algorithms within the MOA Framework

In today's applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We present a novel clustering evaluation framework for data streams. It is an extension of Massive Online Analysis (MOA), a software environment for implementation and evaluation of algorithms for online learning from evolving data streams. Our stream clustering algorithm evaluation framework includes a collection of  online clustering methods and offers tools for extensive evaluation and visualization. Moreover, it allows for bidirectional interaction with WEKA, since it uses the same internal data structures. Our framework is designed for extensibility, allowing straightforward adding of more algorithms, evaluation measures, and data feeds. It is released under the GNU GPL license.

Authors: Kranen P., Kremer H., Jansen T., Seidl T., Bifet A., Holmes G., Pfahringer B.
Published in: 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, DC, USA
Language: EN
Year: 2010

2010 KDD - MOA (demo).pdf(Demo)

Conference: KDD
Url:KDD 2010
Type: Conference papers (peer reviewed)
Research topic: Data Analysis and Knowledge Extraction