I understand why scan.setCaching is good for mapreduce jobs, but I don't understand why setCacheBlocks(false) is bad. Does it overburden the server?
In short - yes, it burdens the RegionServer if you set blockcaching to true in mapreduce jobs .
When you are using mapreduce jobs mostly on input scans , its high probability that the recently scanned input is going to be discarded in next map phase. Blockcache is LRU.. It puts data into Blockcache during first request , then finds that its of no usage in second request and swaps it and the process continues. So the RegionServer is continuosly swapping data in and out of BlockCache for no gain. Its just a lot of unnecessary IO usage.
But in case of normal reading , it's advisable to keep it true to gain from data locality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With