Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between hive thrift server from hive and spark distributions

What's the difference between running hive server using either of the following two commands :-

  1. hive --service hiveserver2
  2. Running hive thrift server from spark/sbin$ ./start-thriftserver.sh

Do they listen on separate ports?

Which one should I use to establish a JDBC connection using Apache Hive JDBC driver in my Java class?

like image 318
BludShot Avatar asked Mar 17 '15 12:03

BludShot


2 Answers

Hiveserver2 is the hive sql engine which can use map reduce, spark or tez as the execution engine. Hive creates the execution plan and then invokes the execution engine to run the query. The optimisation is done by hive.

I am a heavy spark user, but wanted hive available to run adhoc queries through hue. After some research I can see that hive 1.2.1 supports upto spark 1.4.1 as the execution engine. hive 2 has a dependency to spark 1.5 but I have not tried to run it with 1.5 or 1.6.

The spark thrift server can replace hive server 2, and uses spark to actually run the query and do its own execution plan (which may or may not be better than hive), but gives you access to other spark sources such as rdds, text files etc. Of course, you can run the thrift server with the latest version of spark.

like image 154
jonathanChap Avatar answered Sep 28 '22 07:09

jonathanChap


I guess both do the same except when you start Hive Thrift server from spark, it adds one more CLI service to the thrift server which should add spark SQL context to the thrift API.

like image 27
vikas Avatar answered Sep 28 '22 05:09

vikas