Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Error : executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

I am working with following spark config

maxCores = 5
 driverMemory=2g
 executorMemory=17g
 executorInstances=100

Issue: Out of 100 Executors, My job ends up with only 10 active executors, nonetheless enough memory is available. Even tried setting the executors to 250 only 10 remains active.All I am trying to do is loading a mulitpartition hive table and doing df.count over it.

Please help me understanding the issue causing the executors kill
17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called

Not sure why yarn is killing my executors.

like image 908
Vishal Avatar asked Dec 20 '17 13:12

Vishal


People also ask

What is CoarseGrainedExecutorBackend?

CoarseGrainedExecutorBackend is an ExecutorBackend to manage a single coarse-grained executor (that lives as long as the owning executor backend). CoarseGrainedExecutorBackend registers itself as a ThreadSafeRpcEndpoint under the name Executor to communicate with the driver.

What is spark executor Memoryoverhead?

Memory overhead is the amount of off-heap memory allocated to each executor. By default, memory overhead is set to either 10% of executor memory or 384, whichever is higher.


1 Answers

I faced a similar issue where the investigation of the NodeManager-Logs lead me to the root cause. You can access them via the Web-interface

nodeManagerAddress:PORT/logs

The PORT is specified in the yarn-site.xml under yarn.nodemanager.webapp.address. (default: 8042)

My Investigation-Workflow:

  1. Collect logs (yarn logs ... command)
  2. Identify node and container (in these logs) emitting the error
  3. Search the NodeManager-logs by Timestamp of the error for a root cause

Btw: you can access the aggregated collection (xml) of all configurations affecting a node at the same port with:

 nodeManagerAdress:PORT/conf
like image 110
maffe Avatar answered Sep 27 '22 19:09

maffe