Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Spark job fail with "Exit code: 52"

I have had Spark job failing with a trace like this one:

./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container id: container_1455622885057_0016_01_000008
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Exit code: 52
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr:Stack trace: ExitCodeException exitCode=52: 
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell.run(Shell.java:456)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.lang.Thread.run(Thread.java:745)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container exited with a non-zero exit code 52

It took me a while to figure out what "exit code 52" means, so I'm putting this up here for the benefit of others who might be searching

like image 556
Virgil Avatar asked Feb 17 '16 09:02

Virgil


1 Answers

The exit code 52 comes from org.apache.spark.util.SparkExitCode, and it is val OOM=52 - i.e. an OutOfMemoryError. Which makes sense since I also find this in the container logs:

16/02/16 17:09:59 ERROR executor.Executor: Managed memory leak detected; size = 4823704883 bytes, TID = 3226
16/02/16 17:09:59 ERROR executor.Executor: Exception in task 26.0 in stage 2.0 (TID 3226)
java.lang.OutOfMemoryError: Unable to acquire 1248 bytes of memory, got 0
        at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

(note that I'm not really sure at this point if the problem is in my code or due to the Tungsten memory leaks, but that's a different issue)

like image 86
Virgil Avatar answered Sep 20 '22 13:09

Virgil