Amazon EMR Pyspark Module not found

Tags:

I created an Amazon EMR cluster with Spark already on it. When I run pyspark from the terminal it goes into the pyspark terminal when I ssh into my cluster.

I uploaded a file using scp, and when I try to run it with python FileName.py, I get an import error:

from pyspark import SparkContext
ImportError: No module named pyspark

How do I fix this?

594

asked Aug 12 '15 22:08

Stephen Cheng

2 Answers

I add the following lines to ~/.bashrc for emr 4.3:

export SPARK_HOME=/usr/lib/spark
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.XXX-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

Here py4j-0.XXX-src.zip is the py4j file in your spark python library folder. Search /usr/lib/spark/python/lib/ to find the exact version and replace the XXX with that version number.

Run source ~/.bashrc and you should be good.

134

answered Oct 16 '22 11:10

Bob Baxley

You probably need to add the pyspark files to the path. I typically use a function like the following.

def configure_spark(spark_home=None, pyspark_python=None):
    spark_home = spark_home or "/path/to/default/spark/home"
    os.environ['SPARK_HOME'] = spark_home

    # Add the PySpark directories to the Python path:
    sys.path.insert(1, os.path.join(spark_home, 'python'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'pyspark'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'build'))

    # If PySpark isn't specified, use currently running Python binary:
    pyspark_python = pyspark_python or sys.executable
    os.environ['PYSPARK_PYTHON'] = pyspark_python

Then, you can call the function before importing pyspark:

configure_spark('/path/to/spark/home')
from pyspark import SparkContext

Spark home on an EMR node should be something like /home/hadoop/spark. See https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 for more details.

answered Oct 16 '22 11:10

santon

Related questions
                            
                                Split list into sublist based on part of value
                            
                                Django 1.7 app config ImportError: No module named appname.apps
                            
                                Gulp - How to write command to runserver from gulpfile
                            
                                Global variable is undefined at the module level
                            
                                Filter RDD based on row_number
                            
                                Detect SQL injections in the source code
                            
                                Which authentication to be used when using Django Rest Framework and IOS app?
                            
                                Opening .blend files using Blender's Python API
                            
                                PyQt QDialog - returning a value and closing from dialog
                            
                                Can the cumsum function in NumPy decay while adding?
                            
                                Broadcast to all connected clients except sender with python flask socketio
                            
                                IOError: [Errno 13] Permission denied
                            
                                sqlalchemy dynamic schema on entity at runtime
                            
                                Displaying networkx graph with labels
                            
                                Visualizing an LDA model, using Python
                            
                                Pandas: Printing the Names and Values in a Series
                            
                                How to add header in requests
                            
                                How do I setup dependent factories using Factory Boy and Flask-SQLAlchemy?
                            
                                how to solve "bad interpreter: Too many levels of symbolic links"
                            
                                Embed python in to iOS (iphone) app written in Objective-C/Swift/C/C++ (whatever language i can compile in Xcode and bridge to iOS) [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Amazon EMR Pyspark Module not found

Tags:

python

amazon-web-services

pyspark

amazon-emr

Stephen Cheng

People also ask

2 Answers

Bob Baxley

santon

Recent Activity

Donate For Us