Lab - Lab Data Mining Algorithms: "High Performance Data Mining with MapReduce"
|Lecturer:||Univ.-Prof. Dr. rer. nat. T. Seidl
Dipl.-Inform. S. Fries
Dipl.-Inform. B. Boden
Data mining techniques can be applied to extract valuable knowledge from data repositories, e.g. through clustering, classification or association rule mining.
As most existing algorithms only run on a single processor they are only applicable for small or medium datasets. However, in many applications there exist huge amounts of data which makes it necessary to develop massively distributed algorithms.
There are several possibilities for the parallelization of algorithms. Helpful models are the programming model "MapReduce" which was developed by Google (it is implemented in the open source software Hadoop) and its extension PACT.
The aim of this software lab is the application of the MapReduce model (and possibly its extensions) for the development of parallelized versions of existing data mining algorithms and their evaluation.
Knowledge from the lecture "Data Mining Algorithms" are helpful but not mandatory. Programming skills in Java are important for the implementation of the algorithms.
The initial meeting will take place on Wednesday, February 8th 10:00 in the room 6329 (seminar room i9).
The registration is possible from 11.1.2012 up to 23.1.2012 via the central registration for seminars and labs: https://www.graphics.rwth-aachen.de/apse/