Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase MemStore and Garbage Collection

I am new to HBase but I have setup and got some knowledge about HBase and Hadoop.

As I was studying about HBase MemStore and all i understood about the MemStore is that "MemStore is the in-memory place where HBase placed the Data that has to be written or read". So ,that's why when and where we want to read about memstore, we also see discussion about Garbage collection.

Now my question is that is memstore's only purpose is to hold readable and write-able data in in-memory? And can we adjust the size for that memory to get fast reply from hbase? would garbage collection configuration (collectors configuration) effects the memstore? As I think it should be yes. :)

like image 834
khan Avatar asked May 15 '12 08:05

khan


1 Answers

You are right about Hbase Memstore. In general when something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size*, it is flushed to disk into a store file (everything is also written immediately to a log file for durability).

*From Global perspective, HBase uses by default 40% of the heap (see property hbase.regionserver.global.memstore.upperLimit) for all memstores of all regions of all column families of all tables. If this limit is reached, it starts flushing some memstores until the memory used by memstores is below at least 35% of heap (lowerLimit property). This is adjustable but you would need to have perfect calculation to have this change.

Yes GC does impact on memstore and you can actually modify this behavior by using Memstore-local allocation buffer. I would suggest you to read the 3 part article on "Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers" as below : http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/

like image 152
AvkashChauhan Avatar answered Sep 28 '22 07:09

AvkashChauhan