Noise-aware Concise Clustering of Streaming Sensor Data in a Logarithmic Time
Online clustering of streaming sensor data aims at providing summaries
of the observed stream. This task is mostly done under limited
processing and storage resources. This makes the sensed stream speed
(data per time) a sensitive restriction when designing stream clustering
algorithms. Additionally, the varying speed of the stream is a natural
characteristic of sensor data, e.g. changing the sampling rate upon
detecting an event or for a certain time. In such cases, most clustering
algorithms have to heavily restrict their model size such that they can
handle the minimal time allowance. Recently the first anytime stream
clustering algorithm has been proposed that flexibly uses all available
time and dynamically adapts its model size. However, the method was not
designed to precisely cluster sensor data which are usually noisy and
extremely evolving.
In this paper we detail the LiarTree algorithm
that provides precise stream summaries and effectively handles noise,
drift and novelty. We prove that the runtime of the LiarTree is
logarithmic in the size of the maintained model opposed to a linear time
complexity often observed in previous approaches. We demonstrate in an
extensive experimental evaluation using synthetic and real sensor
datasets that the LiarTree outperforms competing approaches in terms of
the quality of the resulting summaries and exposes only a logarithmic
time complexity.
| Authors: | Hassani M., Kranen P., Seidl T. |
| Published in: | KDML Workshop on Knowledge Discovery, Data Mining and Machine Learning in conj. with LWA 2011, Magdeburg, Germany |
| Sprache: | EN |
| Jahr: | 2011 |
| Konferenz: | LWA |
| URL: | LWA 2011 |
| Typ: | Sonstige |
| Forschungsgebiet: | Data Analysis and Knowledge Extraction |

