Why is scan.setCacheBlocks(false) is recommended for mapReduce job?

Question

I understand why scan.setCaching is good for mapreduce jobs, but I don't understand why setCacheBlocks(false) is bad. Does it overburden the server?

Chandra kant · Accepted Answer

In short - yes, it burdens the RegionServer if you set blockcaching to true in mapreduce jobs .
When you are using mapreduce jobs mostly on input scans , its high probability that the recently scanned input is going to be discarded in next map phase. Blockcache is LRU.. It puts data into Blockcache during first request , then finds that its of no usage in second request and swaps it and the process continues. So the RegionServer is continuosly swapping data in and out of BlockCache for no gain. Its just a lot of unnecessary IO usage.
But in case of normal reading , it's advisable to keep it true to gain from data locality.

Why is scan.setCacheBlocks(false) is recommended for mapReduce job?

Tags:

java

hadoop

mapreduce

hbase

hba

1 Answers

Chandra kant

Recent Activity

Donate For Us

Why is scan.setCacheBlocks(false) is recommended for mapReduce job?

Tags:

java

hadoop

mapreduce

hbase

hba

1 Answers

Chandra kant

Related questions

Recent Activity

Donate For Us