Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Connected Components using Hadoop/MapReduce

I need to find connected components for a huge dataset. (Graph being Undirected)

One obvious choice is MapReduce. But i'm a newbie to MapReduce and am quiet short of time to pick it up and to code it myself.

I was just wondering if there is any existing API for the same since it is a very common problem in Social Network Analysis?

Or atleast if anyone is aware of any reliable(tried and tested) source using which atleast i can get started with the implementation myself?

Thanks

like image 480
Shatu Avatar asked May 20 '12 21:05

Shatu


1 Answers

I blogged about it for myself:

http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html

But MapReduce isn't a good fit for these Graph analysis things. Better use BSP (bulk synchronous parallel) for that, Apache Hama provides a good graph API on top of Hadoop HDFS.

I've written a connected components algorithm with MapReduce here: (Mindist search)

https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce

Also a BSP version for Apache Hama can be found here:

https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/MindistSearch.java

The implementation isn't as difficult as in MapReduce and it is at least 10 times faster. If you're interested, checkout the latest version in TRUNK and visit our mailing list.

http://hama.apache.org/

http://apache.org/hama/mail-lists.html

like image 133
Thomas Jungblut Avatar answered Oct 24 '22 16:10

Thomas Jungblut