Pig vs Hive vs Native Map Reduce

Tags:

I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce.

I went through few articles which basically points out that Hive is for structured processing and Pig is for unstructured processing. When do we need native map reduce? Can you point out few scenarios that can't be solved using Pig or Hive but in native map reduce?

494

asked Jul 30 '13 14:07

Maverick

1 Answers

Complex branching logic which has a lot of nested if .. else .. structures is easier and quicker to implement in Standard MapReduce, for processing structured data you could use Pangool, it also simplifies things like JOIN. Also Standard MapReduce gives you full control to minimize the number of MapReduce jobs that your data processing flow requires, which translates into performance. But it requires more time to code and introduce changes.

Apache Pig is good for structured data too, but its advantage is the ability to work with BAGs of data (all rows that are grouped on a key), it is simpler to implement things like:

Get top N elements for each group;
Calculate total per each group and than put that total against each row in the group;
Use Bloom filters for JOIN optimisations;
Multiquery support (it is when PIG tries to minimise the number on MapReduce Jobs by doing more stuff in a single Job)

Hive is better suited for ad-hoc queries, but its main advantage is that it has engine that stores and partitions data. But its tables can be read from Pig or Standard MapReduce.

One more thing, Hive and Pig are not well suited to work with hierarchical data.

answered Oct 09 '22 02:10

alexeipab

Related questions
                            
                                Possible to convert C# get,set code to C++
                            
                                before and after hooks for a request in express (to be executed before any req and after any res)
                            
                                PhoneGap Build iOS app has blank white screen after splash screen
                            
                                View Pager + ImageView +Pinch Zoom + Rotation
                            
                                Error trying to run mstest on jenkins
                            
                                Show git tags sorted by date
                            
                                Generate random array of 0 and 1 with a specific ratio
                            
                                ImportError: cannot import name inplace_column_scale
                            
                                Bower install - failed with ETIMEDOUT
                            
                                How to implement a cleanup routine in R Shiny?
                            
                                Mean by factor by level
                            
                                How to detect when an AVPlayerItem is finished playing? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With