A previous question recommends <code>sc.applicationId</code>, but it is not present in <code>PySpark</code>, only in <code>scala</code>. So, how do I figure out the application id (for <code>yarn</code>) of my PySpark process?

You could use Java SparkContext object through the Py4J RPC gateway: <pre class="prettyprint"><code>>>> sc._jsc.sc().applicationId() u'application_1433865536131_34483' </code></pre> Please note that <code>sc._jsc</code> is internal variable and not the part of public API - so there is (rather small) chance that it may be changed in the future. I'll submit pull request to add public API call for this.

How to extract application ID from the PySpark context

1 Answers

You could use Java SparkContext object through the Py4J RPC gateway:

>>> sc._jsc.sc().applicationId()
u'application_1433865536131_34483'

Please note that sc._jsc is internal variable and not the part of public API - so there is (rather small) chance that it may be changed in the future.

I'll submit pull request to add public API call for this.

163

answered Oct 03 '22 04:10

vvladymyrov

Related questions
                            
                                How to save models from ML Pipeline to S3 or HDFS?
                            
                                create empty array-column of given schema in Spark
                            
                                Spark : check your cluster UI to ensure that workers are registered
                            
                                Spark Task not serializable with lag Window function
                            
                                Spark and Java: Exception thrown in awaitResult
                            
                                Apache Spark Dataframe Groupby agg() for multiple columns
                            
                                How to append an element to an array column of a Spark Dataframe?
                            
                                Does join parallelise well in Spark?
                            
                                error: not found: type SparkConf
                            
                                How to submit a spark job on a remote master node in yarn client mode?
                            
                                How to read Avro file in PySpark
                            
                                Spark: coalesce very slow even the output data is very small
                            
                                Convert Dataframe to a Map(Key-Value) in Spark
                            
                                Why does df.limit keep changing in Pyspark?
                            
                                argmax in Spark DataFrames: how to retrieve the row with the maximum value
                            
                                How can I save an RDD into HDFS and later read it back?
                            
                                How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0
                            
                                How to create a copy of a dataframe in pyspark?
                            
                                Encountering " WARN ProcfsMetricsGetter: Exception when trying to compute pagesize" error when running Spark
                            
                                Is there an "Explain RDD" in spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract application ID from the PySpark context

Tags:

apache-spark

pyspark

hadoop-yarn

sds

People also ask

1 Answers

vvladymyrov

Recent Activity

Donate For Us