Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig vs Hive vs Native Map Reduce

Tags:

I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce.

I went through few articles which basically points out that Hive is for structured processing and Pig is for unstructured processing. When do we need native map reduce? Can you point out few scenarios that can't be solved using Pig or Hive but in native map reduce?

like image 494
Maverick Avatar asked Jul 30 '13 14:07

Maverick


People also ask

Does Pig differ from MapReduce and hive?

Hadoop MapReduce is a compiled language whereas Apache Pig is a scripting language and Hive is a SQL like query language. Pig and Hive provide higher level of abstraction whereas Hadoop MapReduce provides low level of abstraction. Hadoop MapReduce requires more lines of code when compared to Pig and Hive.

Does Pig use MapReduce?

Pig is compatible with not only MapReduce but also Tez and Spark processing engines which provides a significant performance improvement. For the uninitiated, Tez can be considered as a performance efficient version of the MapReduce framework.

What is the advantage of Pig over MapReduce?

Pig uses a language called Pig Latin, which is similar to SQL. This language does not require as much code in order to analyze data. Pig is a high-level scripting platform for creating codes that run on Hadoop. Pig makes it easier to analyze, process, and clean big data without writing vanilla MapReduce jobs in Hadoop.

What is the difference between Hadoop MapReduce and Apache Pig and Hive?

Hadoop MapReduce is a compiled language whereas Apache Pig is a scripting language and Hive is a SQL like query language. Pig and Hive provide higher level of abstraction whereas Hadoop MapReduce provides low level of abstraction. Hadoop MapReduce requires more lines of code when compared to Pig and Hive.

What is the difference between MapReduce and pig language?

Pig is a scripting language used for exploring large data sets. Pig Latin is a Hadoop extension that simplifies Hadoop programming by giving a high-level data processing language. As Pig is scripting we can achieve the functionality by writing very few lines of code. MapReduce is a solution for scaling data processing.

What is the difference between pig and Hive?

The main difference between pig and Hive is pig can process any type of data, either structured or unstructured data. It means it's highly recommendable for streaming data like satellite generated data, live events, schema-less data etc. Pig first load the data later programmer write a program depends on data to make it structured.

What is hive compiler for MapReduce?

For writing queries for MapReduce in SQL fashion, the Hive compiler converts them in the background to be executed in the Hadoop cluster. It helps the programmers to use their SQL knowledge rather than focusing on developing a new language.


1 Answers

Complex branching logic which has a lot of nested if .. else .. structures is easier and quicker to implement in Standard MapReduce, for processing structured data you could use Pangool, it also simplifies things like JOIN. Also Standard MapReduce gives you full control to minimize the number of MapReduce jobs that your data processing flow requires, which translates into performance. But it requires more time to code and introduce changes.

Apache Pig is good for structured data too, but its advantage is the ability to work with BAGs of data (all rows that are grouped on a key), it is simpler to implement things like:

  1. Get top N elements for each group;
  2. Calculate total per each group and than put that total against each row in the group;
  3. Use Bloom filters for JOIN optimisations;
  4. Multiquery support (it is when PIG tries to minimise the number on MapReduce Jobs by doing more stuff in a single Job)

Hive is better suited for ad-hoc queries, but its main advantage is that it has engine that stores and partitions data. But its tables can be read from Pig or Standard MapReduce.

One more thing, Hive and Pig are not well suited to work with hierarchical data.

like image 91
alexeipab Avatar answered Oct 09 '22 02:10

alexeipab