I'm using Spark 2.3 thrift server for Ad-hoc Sql queries. My spark parameters are set as below in spark-defaults.conf file:
spark.executor.memory 24G
spark.executor.cores 40
spark.executor.instances 3
However when I checked the spark web ui,the spark cores were not equal with active tasks as the picture blow shows:
How could the active task nums bigger than the cores allocated? Any ideas? Thanks!
According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.
The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.
executor. cores = 1 in YARN mode, all the available cores on the worker in standalone mode.
Number of executors is the number of distinct yarn containers (think processes/JVMs) that will execute your application. Number of executor-cores is the number of threads you get inside each executor (container).
I've seen the same thing. I'm not 100% sure, but I believe it is a race condition between the Task threadpool on the executor and the metrics reporting code.
If you click on the thread dump you will see the proper number of threads. However if try it 50 times (with a little luck) you will see an extra task thread is just sitting there in the TIMED_WAITING state.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With