I have run some Spark applications on a YARN cluster. The application shows up in the "All applications" page in the YARN UI http://host:8088/cluster but the yarn application -list
command doesnt give any results. What could be the cause of this ?
When you use "-list" option without "-appTypes" or "-appStates" options, it applies default filtering for "application-types" and "states" (check the highlighted section below). If none of your applications match the default filtering, then you will not get any result.
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
If you see the help for "-list", it states the following:
"List applications. Supports optional use of -appTypes to filter applications based on application type, and -appStates to filter applications based on application state".
This seems to be bit misleading.
If you don't specify "-appStates", by default it takes states as "SUBMITTED", "ACCEPTED" and "RUNNING", for filtering. Please check the code below from "listApplications()" method of "org.apache.hadoop.yarn.client.cli.ApplicationCLI.java".
private void listApplications()
{
............
if (allAppStates) {
for (YarnApplicationState appState : YarnApplicationState.values()) {
appStates.add(appState);
}
} else {
if (appStates.isEmpty()) {
appStates.add(YarnApplicationState.RUNNING);
appStates.add(YarnApplicationState.ACCEPTED);
appStates.add(YarnApplicationState.SUBMITTED);
}
}
............
}
As per the code above, following logic is applied:
CMD> yarn application -list
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
CMD> yarn application -list -appStates ALL
ALL Total number of applications (application-types: [] and states: [NEW, NEW_SAVING , SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):268
CMD> yarn application -list -appStates FINISHED
Total number of applications (application-types: [] and states: [FINISHED]):136
It turns out that I had enabled Log aggregation in YARN but had set the yarn.nodemanager.remote-app-log-dir to a custom hdfs directory (/tmp/yarnlogs), So logs were actually getting aggregated at /tmp/yarnlogs in HDFS, but the yarn command was still searching for logs at the default location on HDFS (/tmp/logs). So changing the property to its default value fixed it for me.
NOTE:
If the log aggregation directory is misconfigured ,it also causes an a error while trying to access job history from the web UI, that looks like :Log aggregation has not completed or is not enabled
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With