I have installed pyspark recently. It was installed correctly. When I am using following simple program in python, I am getting an error.
>>from pyspark import SparkContext >>sc = SparkContext() >>data = range(1,1000) >>rdd = sc.parallelize(data) >>rdd.collect()
while running the last line I am getting error whose key line seems to be
[Stage 0:> (0 + 0) / 4]18/01/15 14:36:32 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
I have the following variables in .bashrc
export SPARK_HOME=/opt/spark export PYTHONPATH=$SPARK_HOME/python3
I am using Python 3.
Before starting PySpark, you need to set the following environments to set the Spark path and the Py4j path. Or, to set the above environments globally, put them in the . bashrc file. Then run the following command for the environments to work.
Spark Configuration Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties. Environment variables can be used to set per-machine settings, such as the IP address, through the conf/spark-env.sh script on each node.
PYTHONPATH is an environment variable which you can set to add additional directories where python will look for modules and packages. For most installations, you should not set these variables since they are not needed for Python to run. Python knows where to find its standard library.
In case you only want to change the python version for current task, you can use following pyspark start command: PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..
By the way, if you use PyCharm, you could add PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
to run/debug configurations per image below
You should set the following environment variables in $SPARK_HOME/conf/spark-env.sh
:
export PYSPARK_PYTHON=/usr/bin/python export PYSPARK_DRIVER_PYTHON=/usr/bin/python
If spark-env.sh
doesn't exist, you can rename spark-env.sh.template
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With