Is Hive faster than Spark?

Tags:

After reading What is hive, Is it a database?, a colleague yesterday mentioned that he was able to filter a 15B table, join it with another table after doing a "group by", which resulted in 6B records, in only 10 minutes! I wonder if this would be slower in Spark, since now with the DataFrames, they may be comparable, but I am not sure, thus the question.

Is Hive faster than Spark? Or this question doesn't have meaning? Sorry, for my ignorance.

He uses the latest Hive, which from seems to be using Tez.

502

asked Sep 09 '16 16:09

gsamaras

1 Answers

Hive is just a framework that gives sql functionality to MapReduce type workloads.

These workloads can run on mapreduce or yarn.

So comparing Hive on tez vs Hive on spark. Nice article below discussing this When to go with ETL on Hive using Tez VS When to go with Spark ETL? (Gist use Hive on spark if not sure).

Benchmark information

Lower the better

answered Oct 28 '22 16:10

Krishna Kalyan

Related questions
                            
                                How to get the SerDe Properties of an existing Hive Table
                            
                                Impala on Hadoop 2.2.0 without CDH?
                            
                                Hadoop maps are failing due to ConnectException
                            
                                Flume: Directory to Avro -> Avro to HDFS - Not valid avro after transfer
                            
                                org.apache.hadoop.mapred.LocalClientProtocolProvider not found
                            
                                Hbase master keeps dying, claims a hbase:namespace already exists
                            
                                Load large csv in hadoop via Hue would only store a 64MB block
                            
                                What is the difference between apache Ambari Server and Agent
                            
                                RHbase/thrift install issue
                            
                                Standard practices for logging in MapReduce jobs
                            
                                Hive transform using Python: Unable to initialize custom script
                            
                                Key of object type in the hadoop mapper
                            
                                Hadoop setting the HADOOP_HOME correctly to bin/hadoop it gives command not found
                            
                                Spark NotSerializableException
                            
                                What happens when the intermediate output does not fit in RAM in Spark
                            
                                Startin HBase Shell - Zookeeper exists but fails
                            
                                Why my BroadcastHashJoin is slower than ShuffledHashJoin in Spark
                            
                                Connect to Impala using impyla client with Kerberos auth
                            
                                Error Loading CSV data into a Hive table
                            
                                Spark coalesce relationship with number of executors and cores

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is Hive faster than Spark?

Tags:

apache-spark

hadoop

hive

bigdata

apache-tez

gsamaras

People also ask

1 Answers

Krishna Kalyan

Recent Activity

Donate For Us