Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is scan.setCacheBlocks(false) is recommended for mapReduce job?

I understand why scan.setCaching is good for mapreduce jobs, but I don't understand why setCacheBlocks(false) is bad. Does it overburden the server?

like image 599
hba Avatar asked Oct 16 '25 08:10

hba


1 Answers

In short - yes, it burdens the RegionServer if you set blockcaching to true in mapreduce jobs .
When you are using mapreduce jobs mostly on input scans , its high probability that the recently scanned input is going to be discarded in next map phase. Blockcache is LRU.. It puts data into Blockcache during first request , then finds that its of no usage in second request and swaps it and the process continues. So the RegionServer is continuosly swapping data in and out of BlockCache for no gain. Its just a lot of unnecessary IO usage.
But in case of normal reading , it's advisable to keep it true to gain from data locality.

like image 183
Chandra kant Avatar answered Oct 17 '25 21:10

Chandra kant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!