I am reading about garbage collection tuning in Spark: The Definitive Guide by Bill Chambers and Matei Zaharia. This chapter is largely based on Spark's documentation. Nevertheless, the authors extend the documentation with an example of how to deal with too many minor collections but not many major collections.
Both official documentation and the book state that:
If there are too many minor collections but not many major GCs, allocating more memory for Eden would help. You can set the size of the Eden to be an over-estimate of how much memory each task will need. If the size of Eden is determined to be E, then you can set the size of the Young generation using the option -Xmn=4/3*E. (The scaling up by 4/3 is to account for space used by survivor regions as well.) (See here)
The book offers an example (Spark: The Definitive Guide, first ed., p. 324):
If your task is reading data from HDFS, the amount of memory used by the task can be estimated by using the size of the data block read from HDFS. Note that the size of a decompressed block is often two or three times the size of the block. So if you want to have three or four tasks' worth of working space, and the HDFS block size is 128 MB, we can estimate size of Eden to be 43,128 MB.
Assuming that each uncompressed block takes even 512 MB and we have 4 tasks, and we scale up by 4/3, I don't really see how you can come up with the estimate of 43,128 MB of memory for Eden.
I would rather answer that ~3 GB should be enough for Eden given the book's assumptions.
Could anyone explain how this estimation should be calculated?
OK, I think the new Spark docs make it clear:
As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using the size of the data block read from HDFS. Note that the size of a decompressed block is often 2 or 3 times the size of the block. So if we wish to have 3 or 4 tasks’ worth of working space, and the HDFS block size is 128 MB, we can estimate size of Eden to be 4*3*128MB.
So, it's 4*3*128 MB rather than what the book says (i.e. 43,128 MB).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With