Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated but this got reverted meanwhile. Now I wonder whether it is better to use the old mapred package or the new mapreduce package to create a job and why. Or is it just dependent on whether you need stuff like the MultipleTextOutputFormat which is only available in the old mapred package?

like image 227
momo13 Avatar asked Sep 29 '11 13:09

momo13


People also ask

What is Mapred in Hadoop?

MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.

Is there any benefit of learning MapReduce if Spark is better than MapReduce?

Tasks Spark is good for:In-memory processing makes Spark faster than Hadoop MapReduce – up to 100 times for data in RAM and up to 10 times for data in storage. Iterative processing. If the task is to process data again and again – Spark defeats Hadoop MapReduce.

In which case MapReduce is better than Spark?

Hadoop MapReduce is meant for data that does not fit in the memory whereas Apache Spark has a better performance for the data that fits in the memory, particularly on dedicated clusters. Apache Spark and Hadoop MapReduce both are failure tolerant but comparatively Hadoop MapReduce is more failure tolerant than Spark.


1 Answers

Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here.

Also, the old API has been un-deprecated since 0.21. You can find more information about the new API here.

As you mentioned some of the classes (like MultipleTextOutputFormat) have not been migrated to the new API, due to this and the above mentioned reason it's better to stick to the old API (although a translation is usually quite simple).

like image 153
Praveen Sripati Avatar answered Sep 22 '22 20:09

Praveen Sripati