I read post on quora which tell that Spark Thrift server is related to Apache Thrift which is d binary communication protocol. Spark Thrift server is the interface to Hive, but how does Spark Thrift server use Apache Thrift for communication with Hive via binary protocol/rpc?
Spark Thrift server is a service that allows JDBC and ODBC clients to run Spark SQL queries. The Spark Thrift server is a variant of HiveServer2.
Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of 2020 is an open source project in the Apache Software Foundation.
HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results.
Spark Thrift Server is a Hive-compatible interface for Spark.
That means, it creates implementation of HiveServer2
, you can connect with beeline
, however almost all the computation will be computed with Spark, not Hive.
In the previous versions, query parser was from Hive. Currently Spark Thrift Server works with Spark query parser.
Apache Thrift is a framework to develop RPC - Remote Procedure Calls - so there are many implementations using Thrift. Also Cassandra used Thrift, now it's replaced with Cassandra native protocol.
So, Apache Thrift is a framework to develop RPCs, Spark Thrift Server is an implementation of Hive protol, but it uses Spark as a computation framework.
For more details, please see this link from @RussS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With