Parallel Implementation of a Density-based Stream Clustering Algorithm over a GPU Scheduling System

Graphics Processing Units (GPUs) are used together with the CPU to accelerate a wide range of general purpose applications or scientifi c computations. The highly parallel architecture of the GPU consists of hundreds of cores optimized for parallel performance. Applications taking benefi t of the GPU architecture have to be implemented according to the GPU parallel concept. An algorithm which follows a sequential workflow, has to be redesigned to achieve good performance on the GPU device. DenStream is a recent stream clustering algorithm that consists of two main parts. The online part summarizes data from the data stream, and builds micro clusters, where the offline part generates the final clustering using density-based clustering.

In this work, we present a GPUbased efficient implementation of DenStream called (G-DenStream). GDenStream is faster than DenStream, especially when the dimensionality
of the streaming dataset increases, while keeping the quality of there ected clustering as it is. The implementations in this work achieve palatalization of both online and offline parts and tests the performance and the utilization on the GPU.

Authors: Hassani M., Ayman Tarakji, Lyubomir Georgiev, Seidl T.
Published in: Workshop on Scalable Data Analytics: Theory and Applications (SDA'14), held in conjunction with PAKDD'14 conference (18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Tainan, Taiwan)
Publisher: Springer International Publishing
Language: EN
Year: 2014
Pages: 441-453
ISBN: 978-3-319-13186-3
ISSN: 0302-9743
Conference: PAKDD
DOI:10.1007/978-3-319-13186-3_40
Url:SDA14
PAKDD14
Type: Conference papers (peer reviewed)
Research topic: Data Analysis and Knowledge Extraction