Why are there two separate packages map-reduce package in Apache's hadoop package tree:
org.apache.hadoop.mapred
http://javasourcecode.org/html/open-source/hadoop/hadoop-1.0.3/org/apache/hadoop/mapred/
org.apache.hadoop.mapreduce
http://javasourcecode.org/html/open-source/hadoop/hadoop-1.0.3/org/apache/hadoop/mapreduce/
Why are they separated out? Is there documentation that clarifies this?
mapred is the older API and org. apache. hadoop. mapreduce is the new one. And it was done to allow programmers write MapReduce jobs in a more convenient, easier and sophisticated fashion.
MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.
MapR is a business software distribution company that provides access to different Big Data workloads such as Apache Hadoop and Apache Spark. MapReduce is a programming paradigm of Apache Hadoop. It was developed by Google. MapReduce is the processing layer of the Hadoop architecture.
Apache Spark is one solution, provided by the Apache team itself, to replace MapReduce, Hadoop's default data processing engine. Spark is the new data processing engine developed to address the limitations of MapReduce.
They are separated out because both of these packages represent 2 different APIs. org.apache.hadoop.mapred
is the older API and org.apache.hadoop.mapreduce
is the new one. And it was done to allow programmers write MapReduce jobs in a more convenient, easier and sophisticated fashion. You might find this presentation useful, which talks about the differences in detail.
Hope this answers your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With