I am working with following spark config <pre class="prettyprint"><code>maxCores = 5 driverMemory=2g executorMemory=17g executorInstances=100 </code></pre> Issue: Out of 100 Executors, My job ends up with only 10 active executors, nonetheless enough memory is available. Even tried setting the executors to 250 only 10 remains active.All I am trying to do is loading a mulitpartition hive table and doing df.count over it. <pre class="prettyprint"><code>Please help me understanding the issue causing the executors kill 17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called 17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called </code></pre> Not sure why yarn is killing my executors.

I faced a similar issue where the investigation of the NodeManager-Logs lead me to the root cause. You can access them via the Web-interface <pre class="prettyprint"><code>nodeManagerAddress:PORT/logs </code></pre> The PORT is specified in the yarn-site.xml under yarn.nodemanager.webapp.address. (default: 8042) My Investigation-Workflow: <ol> <li>Collect logs (yarn logs ... command)</li> <li>Identify node and container (in these logs) emitting the error </li> <li>Search the NodeManager-logs by Timestamp of the error for a root cause</li> </ol> Btw: you can access the aggregated collection (xml) of all configurations affecting a node at the same port with: <pre class="prettyprint"><code> nodeManagerAdress:PORT/conf </code></pre>

Spark Error : executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

Tags:

scala

apache-spark

I am working with following spark config

maxCores = 5
 driverMemory=2g
 executorMemory=17g
 executorInstances=100

Issue: Out of 100 Executors, My job ends up with only 10 active executors, nonetheless enough memory is available. Even tried setting the executors to 250 only 10 remains active.All I am trying to do is loading a mulitpartition hive table and doing df.count over it.

Please help me understanding the issue causing the executors kill
17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called

Not sure why yarn is killing my executors.

908

asked Dec 20 '17 13:12

Vishal

1 Answers

I faced a similar issue where the investigation of the NodeManager-Logs lead me to the root cause. You can access them via the Web-interface

nodeManagerAddress:PORT/logs

The PORT is specified in the yarn-site.xml under yarn.nodemanager.webapp.address. (default: 8042)

My Investigation-Workflow:

Collect logs (yarn logs ... command)
Identify node and container (in these logs) emitting the error
Search the NodeManager-logs by Timestamp of the error for a root cause

Btw: you can access the aggregated collection (xml) of all configurations affecting a node at the same port with:

 nodeManagerAdress:PORT/conf

110

answered Sep 27 '22 19:09

maffe

Related questions
                            
                                scala mutable val List
                            
                                How to perform "Lookup" operation on Spark dataframes given multiple conditions
                            
                                Why Mutable map becomes immutable automatically in UserDefinedAggregateFunction(UDAF) in Spark
                            
                                Spark Scala Get Data Back from rdd.foreachPartition
                            
                                Is Scala Option the same as a C# Nullable type?
                            
                                Scala and @Inject annotation
                            
                                Do own stuff in Slick transaction
                            
                                How can I get subgroups of the match in Scala?
                            
                                scala generic function return type
                            
                                Spark : Size exceeds Integer.MAX_VALUE When Joining 2 Large DFs
                            
                                Akka HTTP Websocket, how to identify connections inside of actor
                            
                                Multiple constructors with the same number of parameters exception while transforming data in spark using scala
                            
                                Spark GraphX Aggregation Summation
                            
                                Spark exception with java.lang.ClassNotFoundException: de.unkrig.jdisasm.Disassembler
                            
                                value map is not a member of Branch[Int]
                            
                                How to ensure constant Avro schema generation and avoid the 'Too many schema objects created for x' exception?
                            
                                What is Applicative Builder
                            
                                How to understand monad in scala
                            
                                Using functions as applicative functors/cartesians
                            
                                Flink: How to convert the deprecated fold to aggregrate?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With