Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to understand error "SparkListenerBus has already stopped! Dropping event ..."

Tags:

apache-spark

The issue

I'd like to know if anyone has a magic method to avoid such messages in Spark logs:

2015-08-30 19:30:44 ERROR LiveListenerBus:75 - SparkListenerBus has already
stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

After further investigations, I understand that LiveListenerBus extends AsynchronousListenerBus. And thus, at some point, .stop() method is called. Then, messages that might be sent/received will be dropped and remain unprocessed. Basically, some SparkListenerExecutorMetricsUpdate messages are unfortunately not received yet, and once they are, they become dropped to nowhere.

This doesn't look critical since SparkListenerExecutorMetricsUpdate just correspond to Periodic updates from executors.

What is embarrassing is that I absolutely don't understand why this happens and nothings refers to this issue. Note that this is totally non-deterministic and I can't reproduce this, probably due to the asynchronous nature and my lack of understand on how/when stop() is supposed to be called.

About the running code

A tight sample:

val sc = new SparkContext(sparkConf)
val metricsMap = Metrics.values.toSeq.map(
    v => v -> sc.accumulator(0, v.toString)
).toMap
val outFiles = sc.textFile(outPaths)

And there's no other reference to sc or a SparkContent instance.

like image 583
Adrien M. Avatar asked Sep 01 '15 20:09

Adrien M.


1 Answers

This ticket might be related. https://issues.apache.org/jira/browse/SPARK-12009

The message seems to indicate yarn allocation failure after sparkcontext stop.


Sorry for unclear comment.

The main reason seems that there is some interval between AM's shutdown event and executors stop all.
So, AM tries to reallocate after executors stop.

As Saisai said below,

A interesting thing is that AM is shutting down at time 2015-11-26,03:05:16, but YarnAllocator still request 13 executors after 11 seconds. Looks like AM is not exited so fast, that's why YarnAllocator is still requesting new containers. Normally if AM is exited as fast as it receive disconnected message, there will be not time for container requesting for YarnAllocator.

I have come across similar logs near finishing spark context sometimes.
In my case, this ticket seems to be answer.

like image 146
tabata Avatar answered Jan 02 '23 22:01

tabata