I submit Hadoop applications with YARN java API and not in the terminal. I look for a way to get the yarn aggregated logs by Yarn API after an application finished.
Of course that it could be done by the simple cmd: "yarn logs -applicationId {my_application_ID}" but I want to do so by API.
Does someone know how to get to those logs by using the API and not by command line?
Thanks.
YARN Log Aggregation OverviewThe system which maintains the application logs in HDFS is called the Log Aggregation system and is flexible enough to handle any file system, not just HDFS.
As you can on the code source https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java, this is not trivial, clearly, a log API is missing from YARN API.
Via the API (https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_API)
curl http://yarn.infra/ws/v1/cluster/apps/application_1502112083252_1001
...
<amContainerLogs>
http://node-1.infra:8042/node/containerlogs/container_e41_1502112083252_1001_01_000001/hdfs
</amContainerLogs>
...
And the application attempts (if useful for you):
curl http://yarn.infra/ws/v1/cluster/apps/application_1502112083252_1001/appattempts
..
<logsLink>
http://node-3.infra:8042/node/containerlogs/container_e41_1502112083252_1001_01_000001/hdfs
</logsLink>
..
Let's re-curl these links, this will let you download local logs. But this is not the full log, (I didnt find exactly how to get it, feel free to complete my answer if you find it.)
As far as I know, YARN writes the logs a file-system, possibly HDFS (in my case: hdfs:hadoopsrv:9000/var/log/hadoop/app-logs/
), and user with access rights to these files can get them directly. And from what I understand, yarn logs -applicationId
simply gets them from there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With