Precise Anytime Clustering of Noisy Sensor Data with Logarithmic Complexity

Clustering of streaming sensor data aims at providing online summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving.

In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.

Authors: Hassani M., Kranen P., Seidl T.
Published in: Proc. 5th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD 2011) in conjunction with 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2011), San Diego, CA, USA
Publisher: ACM - NewYork, NY, USA
Language: EN
Year: 2011
Pages: 52-60
ISBN: 978-1-4503-0832-8
Conference: SensorKDD @KDD
Url:SensorKDD 2011
KDD 2011
Type: Conference papers (peer reviewed)
Research topic: Fast Access to Complex Data