Noise-aware Concise Clustering of Streaming Sensor Data in a Logarithmic Time

Online clustering of streaming sensor data aims at providing summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving.
In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.

Authors: Hassani M., Kranen P., Seidl T.
Published in: KDML Workshop on Knowledge Discovery, Data Mining and Machine Learning in conj. with LWA 2011, Magdeburg, Germany
Sprache: EN
Jahr: 2011
Konferenz: LWA
URL:LWA 2011
Typ: Sonstige
Forschungsgebiet: Data Analysis and Knowledge Extraction