Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract application ID from the PySpark context

A previous question recommends sc.applicationId, but it is not present in PySpark, only in scala.

So, how do I figure out the application id (for yarn) of my PySpark process?

like image 731
sds Avatar asked Jun 22 '15 14:06

sds


People also ask

How do I get my Spark application ID?

In Spark we can get the Spark Application ID inside the Task programmatically using: SparkEnv. get. blockManager.

What is Spark application ID?

applicationId. A unique identifier for the Spark application. Its format depends on the scheduler implementation. in case of local spark app something like 'local-1433865536131' in case of YARN something like 'application_1433865536131_34483'

How do I get Spark context?

In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf. getAll() , here spark is an object of SparkSession and getAll() returns Array[(String, String)] , let's see with examples using Spark with Scala & PySpark (Spark with Python).

What is the use of Spark context?

A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one.


1 Answers

You could use Java SparkContext object through the Py4J RPC gateway:

>>> sc._jsc.sc().applicationId()
u'application_1433865536131_34483'

Please note that sc._jsc is internal variable and not the part of public API - so there is (rather small) chance that it may be changed in the future.

I'll submit pull request to add public API call for this.

like image 163
vvladymyrov Avatar answered Oct 03 '22 04:10

vvladymyrov