Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create SparkSession with Hive support (fails with "Hive classes are not found")?

I'm getting an error while trying to run the following code:

import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession;  public class App {   public static void main(String[] args) throws Exception {     SparkSession       .builder()       .enableHiveSupport()       .getOrCreate();           } } 

Output:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.     at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)     at com.training.hivetest.App.main(App.java:21) 

How can it be resolved?

like image 394
Subhadip Majumder Avatar asked Sep 12 '16 06:09

Subhadip Majumder


People also ask

How do I create a new SparkSession?

To create SparkSession in Scala or Python, you need to use the builder pattern method builder() and calling getOrCreate() method. If SparkSession already exists it returns otherwise creates a new SparkSession. SparkSession. builder() – Return SparkSession.

How do I connect PySpark to Hive?

Spark provides HiveContext class to access the hive tables directly in Spark. First, we need to import this class using the import statement like “from pyspark. sql import HiveContext“. Then, we can use this class to create a context for the hive and read the hive tables into Spark dataframe.

How do you get Spark from SparkSession?

In Spark or PySpark SparkSession object is created programmatically using SparkSession. builder() and if you are using Spark shell SparkSession object “ spark ” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession. sparkContext .

What is Spark HiveContext?

public class HiveContext extends SQLContext implements Logging. An instance of the Spark SQL execution engine that integrates with data stored in Hive. Configuration for Hive is read from hive-site. xml on the classpath.


1 Answers

Add following dependency to your maven project.

<dependency>         <groupId>org.apache.spark</groupId>         <artifactId>spark-hive_2.11</artifactId>         <version>2.0.0</version> </dependency> 
like image 83
abaghel Avatar answered Sep 29 '22 13:09

abaghel