Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

I am currently on JRE: 1.8.0_181, Python: 3.6.4, spark: 2.3.2

I am trying to execute following code in Python:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('Basics').getOrCreate()

This fails with following error:

spark = SparkSession.builder.appName('Basics').getOrCreate() Traceback (most recent call last): File "", line 1, in File "C:\Tools\Anaconda3\lib\site-packages\pyspark\sql\session.py", line 173, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 349, in getOrCreate SparkContext(conf=conf or SparkConf()) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 118, in init conf, jsc, profiler_cls) File "C:\Tools\Anaconda3\lib\site-packages\pyspark\context.py", line 195, in _do_init self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc) File "C:\Tools\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 1487, in getattr "{0}.{1} does not exist in the JVM".format(self._fqn, name)) py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Any one has any idea on what can be a potential issue here?

Appreciate any help or feedback here. Thank you!

like image 607
bvkclear Avatar asked Nov 08 '18 23:11

bvkclear


People also ask

Is hail utils package does not exist in the JVM?

getEncryptionEnabled does not exist in the JVM ” due to Spark environemnt variables are not set right. Check if you have your environment variables set right on . <strong>bashrc</strong> file. For Unix and Mac, the variable should be something like below.


3 Answers

As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext, adding PYTHONPATH environment variable (with value as:

%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%,
- just check what py4j version you have in your spark/python/lib folder) helped resolve this issue.

like image 97
bvkclear Avatar answered Oct 06 '22 12:10

bvkclear


Using findspark is expected to solve the problem:

Install findspark

$pip install findspark

In you code use:

import findspark
findspark.init() 

Optionally you can specify "/path/to/spark" in the init method above; findspark.init("/path/to/spark")

like image 34
sm7 Avatar answered Oct 06 '22 12:10

sm7


Solution #1. Check your environment variables

You are getting “py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM” due to environemnt variable are not set right.

Check if you have your environment variables set right on .bashrc file. For Unix and Mac, the variable should be something like below. You can find the .bashrc file on your home path.

Note: Do not copy and paste the below line as your Spark version might be different from the one mentioned below.

export SPARK_HOME=/opt/spark-3.0.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH

If you are running on windows, open the environment variables window, and add/update below.

SPARK_HOME  =>  /opt/spark-3.0.0-bin-hadoop2.7
PYTHONPATH  =>  %SPARK_HOME%/python;%SPARK_HOME%/python/lib/py4j-0.10.9-src.zip;%PYTHONPATH%
PATH  => %SPARK_HOME%/bin;%SPARK_HOME%/python;%PATH%

After setting the environment variables, restart your tool or command prompt.

Solution #2. Using findspark

Install findspark package by running $pip install findspark and add the following lines to your pyspark program

import findspark
findspark.init() 
# you can also pass spark home path to init() method like below
# findspark.init("/path/to/spark")

Solution #3. Copying the pyspark and py4j modules to Anaconda lib

Sometimes after changing/upgrading Spark version, you may get this error due to version incompatible between pyspark version and pyspark available at anaconda lib. In order to correct it

Note: copy the specified folder from inside the zip files and make sure you have environment variables set right as mentioned in the beginning.

Copy the py4j folder from :

C:\apps\opt\spark-3.0.0-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\

to

C:\Programdata\anaconda3\Lib\site-packages\.

And, copy pyspark folder from :

C:\apps\opt\spark-3.0.0-bin-hadoop2.7\python\lib\pyspark.zip\

to

C:\Programdata\anaconda3\Lib\site-packages\

Sometimes, you may need to restart your system in order to effect eh environment variables.

Credits to : https://sparkbyexamples.com/pyspark/pyspark-py4j-protocol-py4jerror-org-apache-spark-api-python-pythonutils-jvm/

like image 16
mounirboulwafa Avatar answered Oct 06 '22 13:10

mounirboulwafa