Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark error does not exist in the jvm error when initializing SparkContext

Tags:

I am using spark over emr and writing a pyspark script, I am getting an error when trying to

from pyspark import SparkContext
sc = SparkContext()

this is the error

File "pyex.py", line 5, in <module>
    sc = SparkContext()   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 195, in _do_init
    self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc)   File "/usr/local/lib/python3.4/site-packages/py4j/java_gateway.py", line 1487, in __getattr__
    "{0}.{1} does not exist in the JVM".format(self._fqn, name)) py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

I found this answer stating that I need to import sparkcontext but this is not working also.

like image 784
thebeancounter Avatar asked Nov 05 '18 20:11

thebeancounter


People also ask

What is SparkContext in Pyspark?

A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster. When you create a new SparkContext, at least the master and app name should be set, either through the named parameters here or through conf .

Is hail utils package does not exist in the JVM?

getEncryptionEnabled does not exist in the JVM ” due to Spark environemnt variables are not set right. Check if you have your environment variables set right on . <strong>bashrc</strong> file. For Unix and Mac, the variable should be something like below.


3 Answers

PySpark recently released 2.4.0, but there's no stable release for spark coinciding with this new version. Try downgrading to pyspark 2.3.2, this fixed it for me

Edit: to be more clear your PySpark version needs to be the same as the Apache Spark version that is downloaded, or you may run into compatibility issues

Check the version of pyspark by using

pip freeze

like image 107
svw Avatar answered Sep 22 '22 19:09

svw


I just had a fresh pyspark installation on my Windows device and was having the exact same issue. What seems to have helped is the following:

Go to your System Environment Variables and add PYTHONPATH to it with the following value: %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%, just check what py4j version you have in your spark/python/lib folder.

The reason why I think this works is because when I installed pyspark using conda, it also downloaded a py4j version which may not be compatible with the specific version of spark, so it seems to package its own version.

like image 29
mugurkt Avatar answered Sep 22 '22 19:09

mugurkt


You need to set the following environments to set the Spark path and the Py4j path.
For example in ~/.bashrc:

export SPARK_HOME=/home/hadoop/spark-2.1.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH

And use findspark at the top of the your file:

import findspark
findspark.init()
like image 10
Роберт Воропаев Avatar answered Sep 24 '22 19:09

Роберт Воропаев