Why does pyspark fail with "Unable to locate hive jars to connect to metastore. Please set spark.sql.hive.metastore.jars."?

Tags:

apache-spark

pyspark

I am using a standalone cluster of apache spark version 2.0.0 with two nodes and i have not installed hive.I am getting the following error on creating a dataframe.

from pyspark import SparkContext
from pyspark import SQLContext
sqlContext = SQLContext(sc)
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect()
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-9-63bc4f21f23e> in <module>()
----> 1 sqlContext.createDataFrame(l).collect()

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio)
    297         Py4JJavaError: ...
    298         """
--> 299         return self.sparkSession.createDataFrame(data, schema, samplingRatio)
    300 
    301     @since(1.3)

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.pyc in createDataFrame(self, data, schema, samplingRatio)
    522             rdd, schema = self._createFromLocal(map(prepare, data), schema)
    523         jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
--> 524         jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
    525         df = DataFrame(jdf, self._wrapped)
    526         df._schema = schema

/home/mok/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    931         answer = self.gateway_client.send_command(command)
    932         return_value = get_return_value(
--> 933             answer, self.gateway_client, self.target_id, self.name)
    934 
    935         for temp_arg in temp_args:

/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: u'Unable to locate hive jars to connect to metastore. Please set spark.sql.hive.metastore.jars.'

So should i install Hive or edit the configurations.

227

asked Aug 27 '16 15:08

naveed mohad abdul

2 Answers

If you have several java versions you'll have to figure out which spark is using (I did this using trial and error , starting with

JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"

and ending with

JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

159

answered Oct 01 '22 12:10

jeremy_rutman

IllegalArgumentException: u'Unable to locate hive jars to connect to metastore. Please set spark.sql.hive.metastore.jars.'

I had the same issue and fixed it by using Java 8. Make sure you install JDK 8 and set the environment variables accordingly.

Do not use Java 11 with Spark / pyspark 2.4.

answered Oct 01 '22 10:10

ashwin kumar

Related questions
                            
                                calculating percentages on a pyspark dataframe
                            
                                SparkSQL and explode on DataFrame in Java
                            
                                Pyspark dataframe how to drop rows with nulls in all columns?
                            
                                Spark Select with a List of Columns Scala
                            
                                How to overwrite Spark ML model in PySpark?
                            
                                Pyspark AWS credentials
                            
                                How to get nth row of Spark RDD?
                            
                                Removing punctuation marks form text in Scala - Spark
                            
                                Add a new column to a Dataframe. New column i want it to be a UUID generator
                            
                                The SPARK_HOME env variable is set but Jupyter Notebook doesn't see it. (Windows)
                            
                                How to improve broadcast Join speed with between condition in Spark
                            
                                How to use lag and rangeBetween functions on timestamp values?
                            
                                Spark: Joining with array
                            
                                Disable parquet metadata summary in Spark
                            
                                how to read json with schema in spark dataframes/spark sql
                            
                                KStreams + Spark Streaming + Machine Learning
                            
                                Spark Dataframe column with last character of other column
                            
                                Adding constant value column to spark dataframe
                            
                                Count the number of missing values in a dataframe Spark
                            
                                spark submit "Service 'Driver' could not bind on port" error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With