Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access SparkContext in pyspark script

Tags:

The following SOF question How to run script in Pyspark and drop into IPython shell when done? tells how to launch a pyspark script:

 %run -d myscript.py 

But how do we access the existin spark context?

Just creating a new one does not work:

 ---->  sc = SparkContext("local", 1)   ValueError: Cannot run multiple SparkContexts at once; existing   SparkContext(app=PySparkShell, master=local) created by <module> at   /Library/Python/2.7/site-packages/IPython/utils/py3compat.py:204 

But trying to use an existing one .. well what existing one?

In [50]: for s in filter(lambda x: 'SparkContext' in repr(x[1]) and len(repr(x[1])) < 150, locals().iteritems()):     print s ('SparkContext', <class 'pyspark.context.SparkContext'>) 

i.e. there is no variable for a SparkContext instance

like image 484
WestCoastProjects Avatar asked Mar 11 '15 23:03

WestCoastProjects


People also ask

How do you get SparkContext in PySpark?

In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf. getAll() , here spark is an object of SparkSession and getAll() returns Array[(String, String)] , let's see with examples using Spark with Scala & PySpark (Spark with Python).

How do you use SparkContext in PySpark?

SparkContext is the entry point to any spark functionality. When we run any Spark application, a driver program starts, which has the main function and your SparkContext gets initiated here. The driver program then runs the operations inside the executors on worker nodes.

How do I access PySpark shell?

Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).

How many SparkContext can PySpark create?

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster.


1 Answers

Include the following:

from pyspark.context import SparkContext 

and then invoke a static method on SparkContext as:

sc = SparkContext.getOrCreate() 
like image 140
TechnoIndifferent Avatar answered Oct 09 '22 14:10

TechnoIndifferent