When running the following in a Python 3.5 Jupyter environment I get the error below. Any ideas on what is causing it?
import findspark findspark.init()
Error:
IndexError Traceback (most recent call last) <ipython-input-20-2ad2c7679ebc> in <module>() 1 import findspark ----> 2 findspark.init() 3 4 import pyspark /.../anaconda/envs/pyspark/lib/python3.5/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile) 132 # add pyspark to sys.path 133 spark_python = os.path.join(spark_home, 'python') --> 134 py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0] 135 sys.path[:0] = [spark_python, py4j] 136 IndexError: list index out of range
This is most likely due to the SPARK_HOME
environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialising findspark
, like so:
import findspark findspark.init('/path/to/spark/home')
After that, it should all work!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With