I am trying to list the applications run on Hadoop cluster. I can get the list to filter by application status as follows:
>yarn application -list -appStates FINISHED
But that still pulls up whole history (last 4-5 days, I guess based on Yarn Timeline server config).
Is there a way to filter that by a specific date or something like last 24 hours?
You can use the RM Apps API to do this. For a simple test you can run:
$ date +"%s"
1495215569
$ let x=1495215569-86400
$ echo $x
1495129169
$ curl 'RMURL/ws/v1/cluster/apps?startedTimeBegin=1495129169000' | python -m json.tool
This pulls the apps that started when date was run minus one day (86400 seconds) and displays them. You need to add 000 as the time parameters take milliseconds not seconds. Supported parameters are:
See https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With