I'm running into an issue when trying to create a table.
Here is the code to create the table, where the exception is occurring:
sparkSession.sql(s"CREATE TABLE IF NOT EXISTS mydatabase.students(" +
s"name string," + s"age int)")
Here is the spark session configuration:
lazy val sparkSession = SparkSession
.builder()
.appName("student_mapping")
.enableHiveSupport()
.getOrCreate()
And this is the exception:
org.apache.spark.sql.AnalysisException: Hive support is required to
CREATE Hive TABLE (AS SELECT);;'CreateTable `mydatabase`.`students`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Ignore
My question is: Why is this exception occurring? I have several other spark programs running with the same session configurations, running flawlessly. I'm using Scala 2.11 and Spark 2.3.
Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions. New in version 2.0.
Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. If Hive dependencies can be found on the classpath, Spark will load them automatically.
In a managed table, both the table data and the table schema are managed by Hive. The data will be located in a folder named after the table within the Hive data warehouse, which is essentially just a file location in HDFS. The location is user-configurable when Hive is installed.
SparkSession is the entry point to Spark SQL. It is one of the very first objects you create while developing a Spark SQL application.
SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.
SessionState is available as the sessionState property of a SparkSession
Internally, sessionState clones the optional parent SessionState (if given when creating the SparkSession) or creates a new SessionState using BaseSessionStateBuilder as defined by spark.sql.catalogImplementation configuration property:
in-memory (default) for org.apache.spark.sql.internal.SessionStateBuilder
hive for org.apache.spark.sql.hive.HiveSessionStateBuilder
For using hive you should use the class org.apache.spark.sql.hive.HiveSessionStateBuilder
and according to the document this can be done by setting the property spark.sql.catalogImplementation
to hive
when creating SparkSession object:
val conf = new SparkConf
.set("spark.sql.warehouse.dir", "hdfs://namenode/sql/metadata/hive")
.set("spark.sql.catalogImplementation","hive")
.setMaster("local[*]")
.setAppName("Hive Example")
val spark = SparkSession.builder()
.config(conf)
.enableHiveSupport()
.getOrCreate()
or you can pass the property --conf spark.sql.catalogImplementation=hive
when you submit your job to the cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With