Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark context 'sc' not defined

Tags:

I am new to Spark and I am trying to install the PySpark by referring to the below site.

http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/

I tried to install both prebuilt package and also by building the Spark package thru SBT.

When I try to run a python code in IPython Notebook I get the below error.

    NameError                                 Traceback (most recent call last)    <ipython-input-1-f7aa330f6984> in <module>()       1 # Check that Spark is working ----> 2 largeRange = sc.parallelize(xrange(100000))       3 reduceTest = largeRange.reduce(lambda a, b: a + b)       4 filterReduceTest = largeRange.filter(lambda x: x % 7 == 0).sum()       5         NameError: name 'sc' is not defined 

In the command window I can see the below error.

<strong>Failed to find Spark assembly JAR.</strong> <strong>You need to build Spark before running this program.</strong> 

Note that I got a scala prompt when I executed spark-shell command

Update:

With help of a friend I am able to fix the issue related to Spark assembly JAR by correcting the contents of .ipython/profile_pyspark/startup/00-pyspark-setup.py file

I have now only the problem of Spark Context variable. Changing the title to be appropriately reflect my current issue.

like image 662
Arvind Avatar asked Jun 10 '15 18:06

Arvind


People also ask

How does Spark define SC?

In Spark/PySpark 'sc' is a SparkContext object that's created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext . vi ~/.

What is SC SQLContext SC?

// sc is an existing SparkContext. val sqlContext = new org. sql. SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame.

How do I get Spark context?

In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf.


2 Answers

you need to do the following after you have pyspark in your path:

from pyspark import SparkContext sc =SparkContext() 
like image 191
venuktan Avatar answered Oct 24 '22 15:10

venuktan


One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell" 

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.

like image 39
Zak.2Z Avatar answered Oct 24 '22 16:10

Zak.2Z