Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

findspark.init() IndexError: list index out of range error

Tags:

When running the following in a Python 3.5 Jupyter environment I get the error below. Any ideas on what is causing it?

import findspark findspark.init() 

Error:

IndexError                                Traceback (most recent call last) <ipython-input-20-2ad2c7679ebc> in <module>()       1 import findspark ----> 2 findspark.init()       3        4 import pyspark  /.../anaconda/envs/pyspark/lib/python3.5/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)     132     # add pyspark to sys.path     133     spark_python = os.path.join(spark_home, 'python') --> 134     py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]     135     sys.path[:0] = [spark_python, py4j]     136   IndexError: list index out of range 
like image 227
tjb305 Avatar asked Feb 14 '17 10:02

tjb305


1 Answers

This is most likely due to the SPARK_HOME environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialising findspark, like so:

import findspark findspark.init('/path/to/spark/home') 

After that, it should all work!

like image 77
gregoltsov Avatar answered Sep 28 '22 12:09

gregoltsov