I'm getting an error while trying to run the following code:
import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class App { public static void main(String[] args) throws Exception { SparkSession .builder() .enableHiveSupport() .getOrCreate(); } }
Output:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)
How can it be resolved?
To create SparkSession in Scala or Python, you need to use the builder pattern method builder() and calling getOrCreate() method. If SparkSession already exists it returns otherwise creates a new SparkSession. SparkSession. builder() – Return SparkSession.
Spark provides HiveContext class to access the hive tables directly in Spark. First, we need to import this class using the import statement like “from pyspark. sql import HiveContext“. Then, we can use this class to create a context for the hive and read the hive tables into Spark dataframe.
In Spark or PySpark SparkSession object is created programmatically using SparkSession. builder() and if you are using Spark shell SparkSession object “ spark ” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession. sparkContext .
public class HiveContext extends SQLContext implements Logging. An instance of the Spark SQL execution engine that integrates with data stored in Hive. Configuration for Hive is read from hive-site. xml on the classpath.
Add following dependency to your maven project.
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>2.0.0</version> </dependency>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With