Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get a “Hive support is required to CREATE Hive TABLE (AS SELECT)” error when creating a table?

I'm running into an issue when trying to create a table.

Here is the code to create the table, where the exception is occurring:

sparkSession.sql(s"CREATE TABLE IF NOT EXISTS mydatabase.students(" +
s"name string," + s"age int)")

Here is the spark session configuration:

lazy val sparkSession = SparkSession
.builder()
.appName("student_mapping")
.enableHiveSupport()
.getOrCreate()

And this is the exception:

org.apache.spark.sql.AnalysisException: Hive support is required to 
CREATE Hive TABLE (AS SELECT);;'CreateTable `mydatabase`.`students`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Ignore

My question is: Why is this exception occurring? I have several other spark programs running with the same session configurations, running flawlessly. I'm using Scala 2.11 and Spark 2.3.

like image 805
G Lor Avatar asked Jun 18 '18 16:06

G Lor


People also ask

What does enable Hive support do?

Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions. New in version 2.0.

What is Hive support in spark?

Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. If Hive dependencies can be found on the classpath, Spark will load them automatically.

What is table schema in Hive?

In a managed table, both the table data and the table schema are managed by Hive. The data will be located in a folder named after the table within the Hive data warehouse, which is essentially just a file location in HDFS. The location is user-configurable when Hive is installed.


1 Answers

SparkSession is the entry point to Spark SQL. It is one of the very first objects you create while developing a Spark SQL application.

SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

SessionState is available as the sessionState property of a SparkSession

Internally, sessionState clones the optional parent SessionState (if given when creating the SparkSession) or creates a new SessionState using BaseSessionStateBuilder as defined by spark.sql.catalogImplementation configuration property:

in-memory (default) for org.apache.spark.sql.internal.SessionStateBuilder

hive for org.apache.spark.sql.hive.HiveSessionStateBuilder

For using hive you should use the class org.apache.spark.sql.hive.HiveSessionStateBuilder and according to the document this can be done by setting the property spark.sql.catalogImplementation to hive when creating SparkSession object:

val conf = new SparkConf
      .set("spark.sql.warehouse.dir", "hdfs://namenode/sql/metadata/hive")
      .set("spark.sql.catalogImplementation","hive")
      .setMaster("local[*]")
      .setAppName("Hive Example")

val spark = SparkSession.builder()
      .config(conf)
      .enableHiveSupport()
      .getOrCreate()

or you can pass the property --conf spark.sql.catalogImplementation=hive when you submit your job to the cluster.

like image 75
Soheil Pourbafrani Avatar answered Oct 23 '22 09:10

Soheil Pourbafrani