I created an Amazon EMR cluster with Spark already on it. When I run pyspark from the terminal it goes into the pyspark terminal when I ssh into my cluster.
I uploaded a file using scp, and when I try to run it with python FileName.py, I get an import error:
from pyspark import SparkContext
ImportError: No module named pyspark
How do I fix this?
You can access the Spark shell by connecting to the master node with SSH and invoking spark-shell . For more information about connecting to the master node, see Connect to the master node using SSH in the Amazon EMR Management Guide. The following examples use Apache HTTP Server access logs stored in Amazon S3.
I add the following lines to ~/.bashrc
for emr 4.3:
export SPARK_HOME=/usr/lib/spark
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.XXX-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Here py4j-0.XXX-src.zip
is the py4j file in your spark python library folder. Search /usr/lib/spark/python/lib/
to find the exact version and replace the XXX
with that version number.
Run source ~/.bashrc
and you should be good.
You probably need to add the pyspark files to the path. I typically use a function like the following.
def configure_spark(spark_home=None, pyspark_python=None):
spark_home = spark_home or "/path/to/default/spark/home"
os.environ['SPARK_HOME'] = spark_home
# Add the PySpark directories to the Python path:
sys.path.insert(1, os.path.join(spark_home, 'python'))
sys.path.insert(1, os.path.join(spark_home, 'python', 'pyspark'))
sys.path.insert(1, os.path.join(spark_home, 'python', 'build'))
# If PySpark isn't specified, use currently running Python binary:
pyspark_python = pyspark_python or sys.executable
os.environ['PYSPARK_PYTHON'] = pyspark_python
Then, you can call the function before importing pyspark:
configure_spark('/path/to/spark/home')
from pyspark import SparkContext
Spark home on an EMR node should be something like /home/hadoop/spark
. See https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With