I create a dataproc cluster using the following command <pre class="prettyprint"><code>gcloud dataproc clusters create datascience \ --initialization-actions \ gs://dataproc-initialization-actions/jupyter/jupyter.sh \ </code></pre> However when I submit my PySpark Job I got the following error <blockquote> Exception: Python in worker has different version 3.4 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. </blockquote> Any Thoughts?

UPDATE THE SPARK ENVIRONMENT TO USE PYTHON 3.7: Open a new terminal and type the following command: <code>export PYSPARK_PYTHON=python3.7</code> This will ensure that the worker nodes use Python 3.7 (same as the Driver) and not the default Python 3.4 DEPENDING ON VERSIONS OF PYTHON YOU HAVE, YOU MAY HAVE TO DO SOME INSTALL/UPDATE ANACONDA: (To install see: https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart) Make sure you have anaconda 4.1.0 or higher. Open a new terminal and check your conda version by typing into a new terminal: <pre class="prettyprint"><code>conda --version </code></pre> checking conda version if you are below anaconda 4.1.0, type <code>conda update conda</code> <ol start="2"> <li>Next we check to see if we have the library nb_conda_kernels by typing</li> </ol> <code>conda list</code> Checking if we have <code>nb_conda_kernels</code> <ol start="3"> <li>If you don’t see <code>nb_conda_kernels</code> type</li> </ol> <code>conda install nb_conda_kernels</code> Installing <code>nb_conda_kernels</code> <ol start="4"> <li>If you are using Python 2 and want a separate Python 3 environment please type the following</li> </ol> <code>conda create -n py36 python=3.6 ipykernel</code> py35 is the name of the environment. You could literally name it anything you want. Alternatively, If you are using Python 3 and want a separate Python 2 environment, you could type the following. <code>conda create -n py27 python=2.7 ipykernel</code> py27 is the name of the environment. It uses python 2.7. <ol start="5"> <li>Ensure the versions of python are installed successfully and close the terminal. Open a new terminal and type <code>pyspark</code>. You should see the new environments appearing.</li> </ol>

Error while running PySpark DataProc Job due to python version

Tags:

python-3.x

apache-spark

google-cloud-dataproc

I create a dataproc cluster using the following command

gcloud dataproc clusters create datascience \
--initialization-actions \
    gs://dataproc-initialization-actions/jupyter/jupyter.sh \

However when I submit my PySpark Job I got the following error

Exception: Python in worker has different version 3.4 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

Any Thoughts?

613

asked Jul 19 '18 15:07

Kassem Shehady

3 Answers

This is due to a difference in the python versions between the master and the worker. By default, the jupyter image installs the latest version of miniconda, which uses the python3.7. However, the worker is still using the default python3.6.

Solution: - specify the miniconda version when creating the master node i.e to install python3.6 in the master node

gcloud dataproc clusters create example-cluster --metadata=MINICONDA_VERSION=4.3.30

Note:

may need updating to have a more sustainable solution to managing the environment

answered Oct 01 '22 14:10

brotich

UPDATE THE SPARK ENVIRONMENT TO USE PYTHON 3.7:

Open a new terminal and type the following command: export PYSPARK_PYTHON=python3.7 This will ensure that the worker nodes use Python 3.7 (same as the Driver) and not the default Python 3.4

DEPENDING ON VERSIONS OF PYTHON YOU HAVE, YOU MAY HAVE TO DO SOME INSTALL/UPDATE ANACONDA:

(To install see: https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart)

Make sure you have anaconda 4.1.0 or higher. Open a new terminal and check your conda version by typing into a new terminal:

conda --version

checking conda version

if you are below anaconda 4.1.0, type conda update conda

Next we check to see if we have the library nb_conda_kernels by typing

conda list

Checking if we have nb_conda_kernels

If you don’t see nb_conda_kernels type

conda install nb_conda_kernels

Installing nb_conda_kernels

If you are using Python 2 and want a separate Python 3 environment please type the following

conda create -n py36 python=3.6 ipykernel

py35 is the name of the environment. You could literally name it anything you want.

Alternatively, If you are using Python 3 and want a separate Python 2 environment, you could type the following.

conda create -n py27 python=2.7 ipykernel

py27 is the name of the environment. It uses python 2.7.

Ensure the versions of python are installed successfully and close the terminal. Open a new terminal and type pyspark. You should see the new environments appearing.

answered Oct 01 '22 15:10

Robert Singh

We fixed it now -- thanks for the intermediate workaround @brotich. Check out the discussion in #300.

PR #306 keeps python at the same version as was already installed (3.6), and installs packages on all nodes to ensure that the master and worker python environments stay identical.

As a side effect, you can choose your python version by passing an argument to the conda init action to change the python version. E.g. --metadata 'CONDA_PACKAGES="python==3.5"'.

PR #311 pins miniconda to a particular version (currently 4.5.4), so we avoid issues like this again. You can use --metadata 'MINICONDA_VERSION=latest' to use the old behavior of always downloading the latest miniconda.

answered Oct 01 '22 15:10

Karthik Palaniappan

Related questions
                            
                                Lists sorting in Python (transpose)
                            
                                Why does print(0.3) print 0.3 and not 0.30000000000000004
                            
                                Dictionary Comprehension for list values
                            
                                How can I change the (locale) thousands separator in Python to Arabic Unicode separator?
                            
                                Find two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth
                            
                                Pdfkit OSError: No wkhtmltopdf executable found
                            
                                How to use the try/except with Selenium Webdriver when having Exceptions on Python
                            
                                Django Rest Framework Response is not JSON serializable error
                            
                                PyQt emit signal with dict
                            
                                Sort Counter by frequency, then alphabetically in Python
                            
                                Does declaring variables in a function called from __init__ still use a key-sharing dictionary?
                            
                                Python Threading not processed parallel
                            
                                Holoview with bokeh does not show plots
                            
                                How to run requests.get asynchronously in Python 3 using asyncio?
                            
                                multiple functions in Python
                            
                                Send messages to telegram group without user input
                            
                                How to Remove outlier from DataFrame using IQR?
                            
                                Combine elements from two arrays by pairs
                            
                                Django: Search many to many field while creating object
                            
                                Python Sqlite3 - Using f strings for update database function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With