Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

I am working on submitting Spark job using Apache Livy batches POST method.

This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id.

I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager.

Is this possible to do using Apache Livy REST API?

like image 234
Ramdev Sharma Avatar asked Sep 16 '25 02:09

Ramdev Sharma


1 Answers

Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log.

Documentation:

  • https://livy.incubator.apache.org/docs/latest/rest-api.html#get-sessionssessionidlog
  • https://livy.incubator.apache.org/docs/latest/rest-api.html#get-batchesbatchidlog

You can create python functions like the one shown below to get logs:

http = HttpHook("GET", http_conn_id=http_conn_id)

def _http_rest_call(self, method, endpoint, data=None, headers=None, extra_options=None):
    if not extra_options:
        extra_options = {}

    self.http.method = method
    response = http.run(endpoint, json.dumps(data), headers, extra_options=extra_options)

    return response


def _get_batch_session_logs(self, batch_id):
    method = "GET"
    endpoint = "batches/" + str(batch_id) + "/log"
    response = self._http_rest_call(method=method, endpoint=endpoint)
    # return response.json()
    return response
like image 126
kaxil Avatar answered Sep 19 '25 16:09

kaxil