I am new to Spark and I am trying to install the PySpark by referring to the below site.
http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/
I tried to install both prebuilt package and also by building the Spark package thru SBT.
When I try to run a python code in IPython Notebook I get the below error.
NameError Traceback (most recent call last) <ipython-input-1-f7aa330f6984> in <module>() 1 # Check that Spark is working ----> 2 largeRange = sc.parallelize(xrange(100000)) 3 reduceTest = largeRange.reduce(lambda a, b: a + b) 4 filterReduceTest = largeRange.filter(lambda x: x % 7 == 0).sum() 5 NameError: name 'sc' is not defined
In the command window I can see the below error.
<strong>Failed to find Spark assembly JAR.</strong> <strong>You need to build Spark before running this program.</strong>
Note that I got a scala prompt when I executed spark-shell command
Update:
With help of a friend I am able to fix the issue related to Spark assembly JAR by correcting the contents of .ipython/profile_pyspark/startup/00-pyspark-setup.py file
I have now only the problem of Spark Context variable. Changing the title to be appropriately reflect my current issue.
In Spark/PySpark 'sc' is a SparkContext object that's created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext . vi ~/.
// sc is an existing SparkContext. val sqlContext = new org. sql. SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame.
In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf.
you need to do the following after you have pyspark in your path:
from pyspark import SparkContext sc =SparkContext()
One solution is adding pyspark-shell
to the shell environment variable PYSPARK_SUBMIT_ARGS:
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell
if a PYSPARK_SUBMIT_ARGS variable is set by a user.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With