Distributed Processing of Data Streams and Large Data Sets
Designing a distributed stream processing systems manage parallel streams that are originated from multiple, physically distributed sources. In this talk, a categorization of such systems is given and two examples of continuous monitoring distributed stream processing systems on sensor networks are presented. In the second part of the talk, distributing single-pass computations is discussed. This distribution is motivated by the fact, that with truly massive data sets like logs of internet activity, even a single-pass processing over the data using a single processor is not possible in a reasonable time. A model of the algorithms similar to those used in MapReduce is presented, and a comparison to the single-pass (streaming) algorithms is discussed.
|Published in:||In: P. G. Kolaitis, M. Lenzerini and N. Schweikardt: Data Exchange, Integration, and Streams (GI-Dagstuhl-Seminar), Dagstuhl Seminar 10452. Slides [PDF]|
|URL:||Dagstuhl seminar homepage|
|Forschungsgebiet:||Data Analysis and Knowledge Extraction|