How to get applicationId of Spark application deployed to YARN in Scala?

Tags:

I'm using the following Scala code (as a custom spark-submit wrapper) to submit a Spark application to a YARN cluster:

val result = Seq(spark_submit_script_here).!!

All I have at the time of submission is spark-submit and the Spark application's jar (no SparkContext). I'd like to capture applicationId from result, but it's empty.

I can see in my command line output the applicationId and rest of the Yarn messages:

INFO yarn.Client: Application report for application_1450268755662_0110

How can I read it within code and get the applicationId ?

305

asked Jan 04 '16 09:01

nish1013

2 Answers

As stated in the Spark issue 5439, you could either use SparkContext.applicationId or parse the stderr output. Now, as you are wrapping the spark-submit command with your own script/object, I would say you need to read the stderr and get the application id.

184

answered Oct 02 '22 01:10

Markon

If you are submitting the job via Python, then this is how you can get the yarn application id:

    cmd_list = [{
            'cmd': '/usr/bin/spark-submit --name %s --master yarn --deploy-mode cluster '
                   '--executor-memory %s --executor-cores %s --num-executors %s '
                   '--class %s %s %s'
                   % (
                       app_name,
                       config.SJ_EXECUTOR_MEMORY,
                       config.SJ_EXECUTOR_CORES,
                       config.SJ_NUM_OF_EXECUTORS,
                       config.PRODUCT_SNAPSHOT_SKU_PRESTO_CLASS,
                       config.SPARK_JAR_LOCATION,
                       config.SPARK_LOGGING_ENABLED
                   ),
            'cwd': config.WORK_DIR
        }]
cmd_output = subprocess.run(cmd_obj['cmd'], shell=True, check=True, cwd=cwd, stderr=subprocess.PIPE)
cmd_output = cmd_output.stderr.decode("utf-8")
yarn_application_ids = re.findall(r"application_\d{13}_\d{4}", cmd_output)
                if len(yarn_application_ids):
                    yarn_application_id = yarn_application_ids[0]
                    yarn_command = "yarn logs -applicationId " + yarn_application_id

answered Oct 02 '22 01:10

Rajiv

Related questions
                            
                                Why does the definition of Array.map in Scala is "throw new Error()"
                            
                                Why can I assign null to a Unit value and why does it get converted to ()?
                            
                                Why can't I create an array of generic type?
                            
                                Is there a standard Scala function for running a block with a timeout?
                            
                                How to convert a Some(" ") to None in one-line?
                            
                                How to initialize empty variables from your own type in Scala?
                            
                                Scala syntax to access property of an option inline and chain "OrElse"?
                            
                                What is the syntax for creating a Map in Scala that uses an enum as a key?
                            
                                Can anyone explain how the symbol "=>" is used in Scala
                            
                                Getting started with Scala, Scalatest, and Maven
                            
                                Futures for blocking calls in Scala
                            
                                Parallel version of Files.walkFileTree (java or scala)
                            
                                def or val for defining Function in Scala
                            
                                Unresolved dependency: com.hadoop.gplcompression#hadoop-lzo;0.4.16 when "sbt update" in scalding
                            
                                Why does sbt-native-packager generate no bin directory?
                            
                                akka.actor.ActorLogging does not log the stack trace of exception by logback
                            
                                Play Framework template that is actually a JS file
                            
                                In Scala find files that match a wildcard String
                            
                                How to run tests in a class sequentially in ScalaTest?
                            
                                Scala return boolean with if else

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get applicationId of Spark application deployed to YARN in Scala?

Tags:

scala

apache-spark

hadoop-yarn

nish1013

People also ask

2 Answers

Markon

Rajiv

Recent Activity

Donate For Us