Difference between Pig and Hive? Why have both? [closed]

2 Answers

Check out this post from Alan Gates, Pig architect at Yahoo!, that compares when would use a SQL like Hive rather than Pig. He makes a very convincing case as to the usefulness of a procedural language like Pig (vs. declarative SQL) and its utility to dataflow designers.

answered Sep 21 '22 12:09

Jakob Homan

Hive was designed to appeal to a community comfortable with SQL. Its philosophy was that we don't need yet another scripting language. Hive supports map and reduce transform scripts in the language of the user's choice (which can be embedded within SQL clauses). It is widely used in Facebook by analysts comfortable with SQL as well as by data miners programming in Python. SQL compatibility efforts in Pig have been abandoned AFAIK - so the difference between the two projects is very clear.

Supporting SQL syntax also means that it's possible to integrate with existing BI tools like Microstrategy. Hive has an ODBC/JDBC driver (that's a work in progress) that should allow this to happen in the near future. It's also beginning to add support for indexes which should allow support for drill-down queries common in such environments.

Finally--this is not pertinent to the question directly--Hive is a framework for performing analytic queries. While its dominant use is to query flat files, there's no reason why it cannot query other stores. Currently Hive can be used to query data stored in Hbase (which is a key-value store like those found in the guts of most RDBMSes), and the HadoopDB project has used Hive to query a federated RDBMS tier.

answered Sep 21 '22 12:09

Joydeep Sen Sarma

Related questions
                            
                                hadoop No FileSystem for scheme: file
                            
                                Can apache spark run without hadoop?
                            
                                The way to check a HDFS directory's size?
                            
                                connect to host localhost port 22: Connection refused
                            
                                How does the MapReduce sort algorithm work?
                            
                                Difference between Hive internal tables and external tables?
                            
                                what's the difference between "hadoop fs" shell commands and "hdfs dfs" shell commands?
                            
                                Failed to locate the winutils binary in the hadoop binary path
                            
                                How does Hadoop process records split across block boundaries?
                            
                                Chaining multiple MapReduce jobs in Hadoop
                            
                                Name node is in safe mode. Not able to leave
                            
                                What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?
                            
                                Difference between HBase and Hadoop/HDFS
                            
                                What is the difference between partitioning and bucketing a table in Hive ?
                            
                                How to copy file from HDFS to the local file system
                            
                                Spark - load CSV file as DataFrame?
                            
                                How to turn off INFO logging in Spark?
                            
                                What are the pros and cons of parquet format compared to other formats?
                            
                                When to use Hadoop, HBase, Hive and Pig?
                            
                                Apache Spark: The number of cores vs. the number of executors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between Pig and Hive? Why have both? [closed]

Tags:

hadoop

hive

apache-pig

Arnkrishn

People also ask

2 Answers

Jakob Homan

Joydeep Sen Sarma

Recent Activity

Donate For Us