Cracker: Connected Components

ABSTRACT
The problem of finding connected components in a graph is common to several applications, such as social network analysis, web graph mining and image processing. The exponentially growing size of these graphs requires the definitions of new computational models and algorithms for processing them on highly distributed architectures, e.g., exploiting MapReduce. In this paper we present CRACKER , an efficient iterative algorithm to detect connected components in large graphs. The strategy of CRACKER is to iteratively grow a spanning tree for each connected component of the graph. Nodes belonging to trees are discarded from the computation in the subsequent iterations. Even if CRACKER requires few additional iterations, we show that it drastically reduces both the total computation time and the volume of message exchanged. We prove the correctness of the algorithm and provide an extensive experimental evaluation. The experimental results show that CRACKER outperforms state-of-the-art methods considering a wide variety of synthetic and real-world graphs

PUBLICATIONS
“Cracker: Crumbling Large Graphs Into Connected Components” was accepted at the 20th IEEE Symposium on Computers and Communications, Larnaca, Cyprus, 6–9 July 2015

CODE
The code used for the experiments is available at:
https://github.com/hpclab/cracker

DATASET
In the following it is possible to find the datasets used in our experimental section

Synthetic datasets:
Diameter
Graph scale
CC Number

Real datasets:
Twitter Sparse
Italy road networks
Indonesia road networks
Protein-to-Protein interactions All species

Please do not hesitate to contact me in case of any issues: