I am using spark 1.5.1 and I'd like to retrieve all jobs status through REST API.
I am getting correct result using /api/v1/applications/{appId}
. But while accessing jobs /api/v1/applications/{appId}/jobs
getting "no such app:{appID}" response.
How should I pass app ID here to retrieve jobs status of application using spark REST API?
Viewing Spark Job Progress When you run a code in the Jupyter notebook, you can see the progress of the Spark job at each cell. You can view the details of that Spark job by clicking on the View Details hyperlink.
Submitting a Spark application. Analytics Engine Serverless provides you with a REST interface to submit Spark applications. The payload passed to the REST API maps to various command-line arguments supported by the spark-submit command. See Parameters for submitting Spark applications for more details.
When a Spark job or application fails, you can use the Spark logs to analyze the failures. The QDS UI provides links to the logs in the Application UI and Spark Application UI. If you are running the Spark job or application from the Analyze page, you can access the logs via the Application UI and Spark Application UI.
Spark provides 4 hidden RESTFUL API
1) Submit the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/create
2) To kill the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/kill/driver-id
3) To check status if the job - curl http://SPARK_MASTER_IP:6066/v1/submissions/status/driver-id
4) Status of the Spark Cluster - http://SPARK_MASTER_IP:8080/json/
If you want to use another APIs you can try Livy , lucidworks url - https://doc.lucidworks.com/fusion/3.0/Spark_ML/Spark-Getting-Started.html
This is supposed to work when accessing a live driver's API endpoints, but since you're using Spark 1.5.x I think you're running into SPARK-10531, a bug where the Spark Driver UI incorrectly mixes up application names and application ids. As a result, you have to use the application name in the REST API url, e.g.
http://localhost:4040/api/v1/applications/Spark%20shell/jobs
According to the JIRA ticket, this only affects the Spark Driver UI; application IDs should work as expected with the Spark History Server's API endpoints.
This is fixed in Spark 1.6.0, which should be released soon. If you want a workaround which should work on all Spark versions, though, then the following approach should work:
The api/v1/applications
endpoint misreports job names as job ids, so you should be able to hit that endpoint, extract the id
field (which is actually an application name), then use that to construct the URL for the current application's job list (note that the /applications
endpoint will only ever return a single job in the Spark Driver UI, which is why this approach should be safe; due to this property, we don't have to worry about the non-uniqueness of application names). For example, in Spark 1.5.2 the /applications
endpoint can return a response which contains a record like
{
id: "Spark shell",
name: "Spark shell",
attempts: [
{
startTime: "2015-09-10T06:38:21.528GMT",
endTime: "1969-12-31T23:59:59.999GMT",
sparkUser: "",
completed: false
}]
}
If you use the contents of this id
field to construct the applications/<id>/jobs
URL then your code should be future-proofed against upgrades to Spark 1.6.0, since the id
field will begin reporting the proper IDs in Spark 1.6.0+.
For those who have this problem and are running on YARN:
According to the docs,
when running in YARN cluster mode, [app-id] will actually be [base-app-id]/[attempt-id], where [base-app-id] is the YARN application ID
So if your call to https://HOST:PORT/api/v1/applications/application_12345678_0123
returns something like
{
"id" : "application_12345678_0123",
"name" : "some_name",
"attempts" : [ {
"attemptId" : "1",
<...snip...>
} ]
}
you can get eg. jobs by calling
https://HOST:PORT/api/v1/applications/application_12345678_0123/1/jobs
(note the "1" before "/jobs").
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With