I am using spark 1.5.1 and I'd like to retrieve all jobs status through REST API. I am getting correct result using <code>/api/v1/applications/{appId}</code>. But while accessing jobs <code>/api/v1/applications/{appId}/jobs</code> getting "no such app:{appID}" response. How should I pass app ID here to retrieve jobs status of application using spark REST API?

This is supposed to work when accessing a live driver's API endpoints, but since you're using Spark 1.5.x I think you're running into SPARK-10531, a bug where the Spark Driver UI incorrectly mixes up application names and application ids. As a result, you have to use the application name in the REST API url, e.g. <pre class="prettyprint"><code>http://localhost:4040/api/v1/applications/Spark%20shell/jobs </code></pre> According to the JIRA ticket, this only affects the Spark Driver UI; application IDs should work as expected with the Spark History Server's API endpoints. This is fixed in Spark 1.6.0, which should be released soon. If you want a workaround which should work on all Spark versions, though, then the following approach should work: The <code>api/v1/applications</code> endpoint misreports job names as job ids, so you should be able to hit that endpoint, extract the <code>id</code> field (which is actually an application name), then use that to construct the URL for the current application's job list (note that the <code>/applications</code> endpoint will only ever return a single job in the Spark Driver UI, which is why this approach should be safe; due to this property, we don't have to worry about the non-uniqueness of application names). For example, in Spark 1.5.2 the <code>/applications</code> endpoint can return a response which contains a record like <pre class="prettyprint"><code>{ id: "Spark shell", name: "Spark shell", attempts: [ { startTime: "2015-09-10T06:38:21.528GMT", endTime: "1969-12-31T23:59:59.999GMT", sparkUser: "", completed: false }] } </code></pre> If you use the contents of this <code>id</code> field to construct the <code>applications/<id>/jobs</code> URL then your code should be future-proofed against upgrades to Spark 1.6.0, since the <code>id</code> field will begin reporting the proper IDs in Spark 1.6.0+.

For those who have this problem and are running on YARN: According to the docs, <blockquote> when running in YARN cluster mode, [app-id] will actually be [base-app-id]/[attempt-id], where [base-app-id] is the YARN application ID </blockquote> So if your call to <code>https://HOST:PORT/api/v1/applications/application_12345678_0123</code> returns something like <pre class="prettyprint"><code>{ "id" : "application_12345678_0123", "name" : "some_name", "attempts" : [ { "attemptId" : "1", <...snip...> } ] } </code></pre> you can get eg. jobs by calling <pre class="prettyprint"><code>https://HOST:PORT/api/v1/applications/application_12345678_0123/1/jobs </code></pre> (note the "1" before "/jobs").

How to get all jobs status through spark REST API?

3 Answers

Spark provides 4 hidden RESTFUL API

1) Submit the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/create

2) To kill the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/kill/driver-id

3) To check status if the job - curl http://SPARK_MASTER_IP:6066/v1/submissions/status/driver-id

4) Status of the Spark Cluster - http://SPARK_MASTER_IP:8080/json/

If you want to use another APIs you can try Livy , lucidworks url - https://doc.lucidworks.com/fusion/3.0/Spark_ML/Spark-Getting-Started.html

110

answered Jan 05 '23 02:01

Ashwini Kumar

This is supposed to work when accessing a live driver's API endpoints, but since you're using Spark 1.5.x I think you're running into SPARK-10531, a bug where the Spark Driver UI incorrectly mixes up application names and application ids. As a result, you have to use the application name in the REST API url, e.g.

http://localhost:4040/api/v1/applications/Spark%20shell/jobs

According to the JIRA ticket, this only affects the Spark Driver UI; application IDs should work as expected with the Spark History Server's API endpoints.

This is fixed in Spark 1.6.0, which should be released soon. If you want a workaround which should work on all Spark versions, though, then the following approach should work:

The api/v1/applications endpoint misreports job names as job ids, so you should be able to hit that endpoint, extract the id field (which is actually an application name), then use that to construct the URL for the current application's job list (note that the /applications endpoint will only ever return a single job in the Spark Driver UI, which is why this approach should be safe; due to this property, we don't have to worry about the non-uniqueness of application names). For example, in Spark 1.5.2 the /applications endpoint can return a response which contains a record like

{
   id: "Spark shell",
   name: "Spark shell",
   attempts: [
   {
       startTime: "2015-09-10T06:38:21.528GMT",
       endTime: "1969-12-31T23:59:59.999GMT",
       sparkUser: "",
       completed: false
   }]
}

If you use the contents of this id field to construct the applications/<id>/jobs URL then your code should be future-proofed against upgrades to Spark 1.6.0, since the id field will begin reporting the proper IDs in Spark 1.6.0+.

answered Jan 05 '23 03:01

Josh Rosen

For those who have this problem and are running on YARN:

According to the docs,

when running in YARN cluster mode, [app-id] will actually be [base-app-id]/[attempt-id], where [base-app-id] is the YARN application ID

So if your call to https://HOST:PORT/api/v1/applications/application_12345678_0123 returns something like

{
  "id" : "application_12345678_0123",
  "name" : "some_name",
  "attempts" : [ {
    "attemptId" : "1",
    <...snip...>
  } ]
}

you can get eg. jobs by calling

https://HOST:PORT/api/v1/applications/application_12345678_0123/1/jobs

(note the "1" before "/jobs").

answered Jan 05 '23 03:01

margold

Related questions
                            
                                create a Jax-RS RESTful service that accepts both POST and GET?
                            
                                Why is my Web API method with double args not getting called?
                            
                                Mysql returning incorrect bigint result by one, very strange error
                            
                                SpringBoot @WebMvcTest security issue
                            
                                cURL PHP RESTful service always returning FALSE
                            
                                PHP: Using API key in CURL GET Call
                            
                                Elasticsearch: HOW-TO delete a (cluster) setting
                            
                                nginx as a reverse proxy to limit http verb access
                            
                                Resteasy - generate REST documentation from Javadoc and Annotations [closed]
                            
                                Best approach to redirect an URL using REST
                            
                                How limiting are web frameworks
                            
                                Jersey client exception: A message body writer was not found
                            
                                Download CSV file via Rest
                            
                                REST web services: Symfony 2 vs silex [closed]
                            
                                Spring REST Controller returns empty JSON. Iterable data structure. Why?
                            
                                Multi-lingual REST resources - URL naming suggestions
                            
                                DataMember could not be found
                            
                                Elasticsearch Scan&scroll with JEST API
                            
                                How to Create InputStream Object from JsonObject
                            
                                Fetch products by category in Woocommerce REST API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get all jobs status through spark REST API?

Tags:

rest

apache-spark

Vaibhav Raut

People also ask

3 Answers

Ashwini Kumar

Josh Rosen

margold

Recent Activity

Donate For Us