Google is literally littered with solutions to this problem, but unfortunately even after trying out all the possibilities, am unable to get it working, so please bear with me and see if something strikes you.
OS: MAC
Spark : 1.6.3 (2.10)
Jupyter Notebook : 4.4.0
Python : 2.7
Scala : 2.12.1
I was able to successfully install and run Jupyter notebook. Next, i tried configuring it to work with Spark, for which i installed spark interpreter using Apache Toree. Now when i try running any RDD operation in notebook, following error is thrown
Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/private/tmp/hadoop-xxxx/nm-local-dir/usercache/xxxx/filecache/33/spark-assembly-1.6.3-hadoop2.2.0.jar
Things already tried: 1. Set PYTHONPATH in .bash_profile 2. Am able to import 'pyspark' in python-cli on local 3. Have tried updating interpreter kernel.json to following
{
"language": "python",
"display_name": "Apache Toree - PySpark",
"env": {
"__TOREE_SPARK_OPTS__": "",
"SPARK_HOME": "/Users/xxxx/Desktop/utils/spark",
"__TOREE_OPTS__": "",
"DEFAULT_INTERPRETER": "PySpark",
"PYTHONPATH": "/Users/xxxx/Desktop/utils/spark/python:/Users/xxxx/Desktop/utils/spark/python/lib/py4j-0.9-src.zip:/Users/xxxx/Desktop/utils/spark/python/lib/pyspark.zip:/Users/xxxx/Desktop/utils/spark/bin",
"PYSPARK_SUBMIT_ARGS": "--master local --conf spark.serializer=org.apache.spark.serializer.KryoSerializer",
"PYTHON_EXEC": "python"
},
"argv": [
"/usr/local/share/jupyter/kernels/apache_toree_pyspark/bin/run.sh",
"--profile",
"{connection_file}"
]
}
Use findspark lib to bypass all environment setting up process. Here is the link for more information. https://github.com/minrk/findspark
Use it as below.
import findspark
findspark.init('/path_to_spark/spark-x.x.x-bin-hadoopx.x')
from pyspark.sql import SparkSession
I tried the following command in Windows to link pyspark on jupyter.
On *nix, use export
instead of set
Type below code in CMD/Command Prompt
set PYSPARK_DRIVER_PYTHON=ipython
set PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With