I'm seeing the following type of messages when caching large dataframes in pyspark with YARN:-
WARN BlockManagerMasterEndpoint: No more replicas available for rdd_23_62 !
What exactly does this message mean?
Is it causing the subsequent Container killed on request. Exit code is 143
error?
The container killed message is pretty common. It basically means 'something did not go as planned so Spark gave up'.
Though I could not find any great references for the exact error, my understanding is as follows:
When spark has to work on data, it replicates this across nodes. (Primarily in memory, though depending on the situation/settings it might choose to spill to disk).
The message itself seems to be pretty clear: It was trying to work on data, and did not find any replica available to do so.
What I do not know is the root cause.
In the latter case, I could think of two main reasons:
I know this leaves some possibilities open, but hopefully this helps in coming to the root cause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With