Distributed Processing of Data Streams and Large Data Sets

Designing a distributed stream processing systems manage parallel streams that are originated from multiple, physically distributed sources. In this talk, a categorization of such systems is given and two examples of continuous monitoring distributed stream processing systems on sensor networks are presented. In the second part of the talk, distributing single-pass computations is discussed. This distribution is motivated by the fact, that with truly massive data sets like logs of internet activity, even a single-pass processing over the data using a single processor is not possible in a reasonable time. A model of the algorithms similar to those used in MapReduce is presented, and a comparison to the single-pass (streaming) algorithms is discussed.

 

Authors: Hassani M.
Published in: In: P. G. Kolaitis, M. Lenzerini and N. Schweikardt: Data Exchange, Integration, and Streams (GI-Dagstuhl-Seminar), Dagstuhl Seminar 10452. Slides [PDF]  
Language: EN
Year: 2010
Url:Dagstuhl seminar homepage
Type: Miscellaneous (not reviewed)
Research topic: Data Analysis and Knowledge Extraction