Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark executor logs on YARN

I'm launching a distributed Spark application in YARN client mode, on a Cloudera cluster. After some time I see some errors on Cloudera Manager. Some executors get disconnected and this happens systematically. I would like to debug the issue but the internal exception is not reported by YARN.

Exception from container-launch with container ID: container_1417503665765_0193_01_000003 and exit code: 1
ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

How can I see the stacktrace of the exception? It seems that YARN reports only that the application exited abnormally. Is there a way to see spark executor log in YARN configuration ?

like image 387
Nicola Ferraro Avatar asked Dec 06 '14 20:12

Nicola Ferraro


People also ask

Where are Spark executor logs stored?

Standalone mode: Spark executor logs are located in the $SPARK_HOME/work/app-<AppName> directory (where <AppName> is the name of your application). The location also contains stdout/stderr from H2O.

Can you use Spark with YARN?

There are two deploy modes that can be used to launch Spark applications on YARN. In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

What are executors when we run Spark on YARN?

When running Spark on YARN, each Spark executor runs as a YARN container. Where MapReduce schedules a container and fires up a JVM for each task, Spark hosts multiple tasks within the same container. This approach enables several orders of magnitude faster task startup time.


1 Answers

Check NodeManager's yarn.nodemanager.log-dir property. It's the log location of when Spark executor container is running.

Note that when the application finishes NodeManager may remove the files (Log Aggregation). Check this document for detail. http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

like image 150
2 revs, 2 users 80% Avatar answered Sep 23 '22 05:09

2 revs, 2 users 80%