Subspace Anytime Stream Clustering

Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, ''anytime'' algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space.
In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.

Authors: Hassani M., Kranen P., Saini R., Seidl T.
Published in: Proc. of the 26th International Conference on Scientific and Statistical Database Management (SSDBM 2014), Aalborg, Denmark.
Publisher: ACM
Sprache: EN
Jahr: 2014

Article No. 37

ISBN: 978-1-4503-2722-0
Konferenz: SSDBM
Typ: Tagungsbeiträge
Forschungsgebiet: Data Analysis and Knowledge Extraction