Why can't PySpark find py4j.java_gateway?

I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error:

In [1]: import pyspark --------------------------------------------------------------------------- ImportError                               Traceback (most recent call last) <ipython-input-1-c15ae3402d12> in <module>() ----> 1 import pyspark  /usr/local/spark/python/pyspark/__init__.py in <module>()      61      62 from pyspark.conf import SparkConf ---> 63 from pyspark.context import SparkContext      64 from pyspark.sql import SQLContext      65 from pyspark.rdd import RDD  /usr/local/spark/python/pyspark/context.py in <module>()      28 from pyspark.conf import SparkConf      29 from pyspark.files import SparkFiles ---> 30 from pyspark.java_gateway import launch_gateway      31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \      32     PairDeserializer, CompressedSerializer  /usr/local/spark/python/pyspark/java_gateway.py in <module>()      24 from subprocess import Popen, PIPE      25 from threading import Thread ---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient      27      28  ImportError: No module named py4j.java_gateway 
In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:


That worked in ipython for me.

Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.

