Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use MultithreadedMapper class in Hadoop Mapreduce?

I came across MultithreadedMapper class in the new Hadoop version,and the documentation says that it can be used instead of the conventional (single-threaded) mapper class. But I didn't come across any demo example for using this new class. Also, I would be happier to use setNumberOfThreads() method. Any code example for using this?

Thanks in advance

like image 526
Harsh Avatar asked Feb 21 '23 02:02

Harsh


1 Answers

small code snippet for you:

Configuration conf = new Configuration();
Job job = new Job(conf);
job.setMapperClass(MultithreadedMapper.class);
conf.set("mapred.map.multithreadedrunner.class", WebGraphMapper.class.getCanonicalName());
conf.set("mapred.map.multithreadedrunner.threads", "8");
job.setJarByClass(WebGraphMapper.class);
// rest ommitted
job.waitForCompletion(true);

I think it is pretty self-explaining. You are using the multithreaded mapper as the main class and then configure which class (your real mapper) it has to run. There are also these convenience static methods which does this configuration stuff for you. A call could look like this:

MultithreadedMapper.setMapperClass(job, WebGraphMapper.class);
MultithreadedMapper.setNumberOfThreads(job, 8);
like image 69
Thomas Jungblut Avatar answered Mar 05 '23 21:03

Thomas Jungblut