I would like to run pySpark from Jupyter notebook. I downloaded and installed Anaconda which had Juptyer. I created the following lines
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
I get the following error
ImportError Traceback (most recent call last)
<ipython-input-3-98c83f0bd5ff> in <module>()
----> 1 from pyspark import SparkConf, SparkContext
2 conf = SparkConf().setMaster("local").setAppName("My App")
3 sc = SparkContext(conf = conf)
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\__init__.py in <module>()
39
40 from pyspark.conf import SparkConf
---> 41 from pyspark.context import SparkContext
42 from pyspark.rdd import RDD
43 from pyspark.files import SparkFiles
C:\software\spark\spark-1.6.2-bin-hadoop2.6\python\pyspark\context.py in <module>()
26 from tempfile import NamedTemporaryFile
27
---> 28 from pyspark import accumulators
29 from pyspark.accumulators import Accumulator
30 from pyspark.broadcast import Broadcast
ImportError: cannot import name accumulators
I tried adding the following environment variable PYTHONPATH which points to the spark/python directory, based on an answer in Stackoverflow importing pyspark in python shell
but this was of no help
This worked for me:
import os
import sys
spark_path = "D:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path
sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.9-src.zip")
from pyspark import SparkContext
from pyspark import SparkConf
sc = SparkContext("local", "test")
To verify:
In [2]: sc
Out[2]: <pyspark.context.SparkContext at 0x707ccf8>
2018 version
INSTALL PYSPARK on Windows 10 JUPYTER-NOTEBOOK With ANACONDA NAVIGATOR
Download Packages
1) spark-2.2.0-bin-hadoop2.7.tgz Download
2) java jdk 8 version Download
3) Anaconda v 5.2 Download
4) scala-2.12.6.msi Download
5) hadoop v2.7.1Download
MAKE SPARK FOLDER IN C:/ DRIVE AND PUT EVERYTHING INSIDE IT It will look like this
NOTE : DURING INSTALLATION OF SCALA GIVE PATH OF SCALA INSIDE SPARK FOLDER
NOW SET NEW WINDOWS ENVIRONMENT VARIABLES
HADOOP_HOME=C:\spark\hadoop
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151
SCALA_HOME=C:\spark\scala\bin
SPARK_HOME=C:\spark\spark\bin
PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe
PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS=notebook
NOW SELECT PATH OF SPARK :
Click on Edit and add New
Add "C:\spark\spark\bin” to variable “Path” Windows
thats it your browser will pop up with Juypter localhost
Check pyspark is working or not !
Type simple code and run it
from pyspark.sql import Row
a = Row(name = 'Vinay' , age=22 , height=165)
print("a: ",a)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With