CC-MR - Finding Connected Components in Huge Graphs with MapReduce

 

On this page we offer supplementary material for our paper "Thomas Seidl, Brigitte Boden, Sergej Fries: CC-MR - Finding Connected Components in Huge Graphs with MapReduce, ECML PKDD 2012."

 

 

Here we provide the jar file for running the CC-MR algorithm: ConnectedComponentsBigComps.jar  

As an example of how to run the algorithm, we provide a script in the following file:

script.txt

 

 

In the following we provide the synthetic and real-world datasets used in our experiments. Short descriptions of the datasets can be found in the paper.

Sometimes the .zip archives had to be split into several parts.

Synthetic Datasets

 

For each of our experiments with synthetic data, the datasets are provided in one or two .zip archives.

 

Varying component diameter: Diameter.zip

 

Varying component sizes: ComponentSize1.zip ComponentSize2.zip

 

Varying number of components: NumberOfComponents1.zip NumberOfComponents2.zip

Real-World Datasets

 

Web-google: web-Google.zip

 

IMDb: imdb1.zip imdb2.zip imdb3.zip

 

DBLP: dblp.zip