pyspark ImportError: cannot import name accumulators

Question

Goal: I am trying to get apache-spark pyspark to be appropriately interpreted within my pycharm IDE.

Problem: I currently receive the following error:

ImportError: cannot import name accumulators

I was following the following blog to help me through the process. http://renien.github.io/blog/accessing-pyspark-pycharm/

Due to the fact my code was taking the except path I personally got rid of the try: except: just to see what the exact error was.

Prior to this I received the following error:

ImportError: No module named py4j.java_gateway

This was fixed simply by typing '$sudo pip install py4j' in bash.

My code currently looks like the following chunk:

import os
import sys

# Path for spark source folder
os.environ['SPARK_HOME']="[MY_HOME_DIR]/spark-1.2.0"

# Append pyspark to Python Path
sys.path.append("[MY_HOME_DIR]/spark-1.2.0/python/")

try:
    from pyspark import SparkContext
    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)
    sys.exit(1)

My Questions:
1. What is the source of this error? What is the cause? 2. How do I remedy the issue so I can run pyspark in my pycharm editor.

NOTE: The current interpreter I use in pycharm is Python 2.7.8 (~/anaconda/bin/python)

Thanks ahead of time!

Don

ben.ko · Accepted Answer

It is around the variable PYTHONPATH, which specifies python module searching path.

Cause mostly pyspark runs well, you could refer to the shell script pyspark, and see the PYTHONPATH setting is like as below.

PYTHONPATH=/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip:/usr/lib/spark/python.

My environment is Cloudera Qickstart VM 5.3.

Hope this helps.

matt2000 · Answer

This looks to me like a circular-dependency bug.

In MY_HOME_DIR]/spark-1.2.0/python/pyspark/context.py remove or comment-out the line

from pyspark import accumulators.

It's about 6 lines of code from the top.

I filed an issue with the Spark project here:

https://issues.apache.org/jira/browse/SPARK-4974

pyspark ImportError: cannot import name accumulators

Tags:

python

pycharm

apache-spark

AntiPawn79

2 Answers

ben.ko

matt2000

Recent Activity

Donate For Us

pyspark ImportError: cannot import name accumulators

Tags:

python

pycharm

apache-spark

AntiPawn79

2 Answers

ben.ko

matt2000

Related questions

Recent Activity

Donate For Us