Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: 'JavaPackage' object is not callable

when I code the spark sql API hiveContext.sql()

from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext,HiveContext

conf = SparkConf().setAppName("spark_sql")

sc = SparkContext(conf = conf)
hc = HiveContext(sc)

#rdd = sc.textFile("test.txt")
sqlContext = SQLContext(sc)
res = hc.sql("use teg_uee_app")
#for each in res.collect():
#    print(each[0])
sc.stop()

I got the following error:

enFile "spark_sql.py", line 23, in <module>
res = hc.sql("use teg_uee_app")
File "/spark/python/pyspark/sql/context.py", line 580, in sql
    return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/spark/python/pyspark/sql/context.py", line 683, in _ssql_ctx
    self._scala_HiveContext = self._get_hive_ctx()
File "/spark/python/pyspark/sql/context.py", line 692, in _get_hive_ctx
return self._jvm.HiveContext(self._jsc.sc())
  TypeError: 'JavaPackage' object is not callable

how do I add SPARK_CLASSPATH or SparkContext.addFile?I don't have idea.

like image 879
林雅峰 Avatar asked Jul 05 '16 13:07

林雅峰


2 Answers

Maybe this will help you: When using HiveContext I have to add three jars to the spark-submit arguments:

spark-submit --jars /usr/lib/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar ...

Of course the paths and versions depend on your cluster setup.

like image 91
Christian Z. Avatar answered Nov 15 '22 16:11

Christian Z.


In my case this turned out to be a classpath issue - I had a Hadoop jar on the classpath that was a wrong version of Hadoop than I was running.

Make sure you only set the executor and/or driver classpaths in one place and that there's no system-wide default applied somewhere such as .bashrc or Spark's conf/spark-env.sh.

like image 27
dskrvk Avatar answered Nov 15 '22 17:11

dskrvk