What's the difference between running hive server using either of the following two commands :-
hive --service hiveserver2
spark/sbin$ ./start-thriftserver.sh
Do they listen on separate ports?
Which one should I use to establish a JDBC connection using Apache Hive JDBC driver in my Java class?
Hiveserver2 is the hive sql engine which can use map reduce, spark or tez as the execution engine. Hive creates the execution plan and then invokes the execution engine to run the query. The optimisation is done by hive.
I am a heavy spark user, but wanted hive available to run adhoc queries through hue. After some research I can see that hive 1.2.1 supports upto spark 1.4.1 as the execution engine. hive 2 has a dependency to spark 1.5 but I have not tried to run it with 1.5 or 1.6.
The spark thrift server can replace hive server 2, and uses spark to actually run the query and do its own execution plan (which may or may not be better than hive), but gives you access to other spark sources such as rdds, text files etc. Of course, you can run the thrift server with the latest version of spark.
I guess both do the same except when you start Hive Thrift server from spark, it adds one more CLI service to the thrift server which should add spark SQL context to the thrift API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With