Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WARN BlockManagerMasterEndpoint: No more replicas available for rdd

I'm seeing the following type of messages when caching large dataframes in pyspark with YARN:-

WARN BlockManagerMasterEndpoint: No more replicas available for rdd_23_62 !    

What exactly does this message mean?

Is it causing the subsequent Container killed on request. Exit code is 143 error?

like image 461
Bob Avatar asked Jun 27 '19 07:06

Bob


1 Answers

The container killed message is pretty common. It basically means 'something did not go as planned so Spark gave up'.

Though I could not find any great references for the exact error, my understanding is as follows:

When spark has to work on data, it replicates this across nodes. (Primarily in memory, though depending on the situation/settings it might choose to spill to disk).

The message itself seems to be pretty clear: It was trying to work on data, and did not find any replica available to do so.

What I do not know is the root cause.

  1. It may be that somehow taking in the data failed
  2. Alternately perhaps it was succesfull in creating a replica but this cannot be accessed now

In the latter case, I could think of two main reasons:

  • A valid replica exists but accessing it fails (e.g. some kind of networking issue)
  • Though it was valid once, there is no valid replica now. Perhaps it got corrupted or kicked out (e.g. by another application?)

I know this leaves some possibilities open, but hopefully this helps in coming to the root cause.

like image 128
Dennis Jaheruddin Avatar answered Sep 19 '22 15:09

Dennis Jaheruddin