Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase- Store file vs HFile and Compaction

Tags:

hbase

What is the difference between Store file and HFile??

I have basic idea about compaction i.e. store files are merged together to reduce seeks from the disk.

Is it correct?? Can someone explain more about Compaction like the exact process and how it works?

like image 730
Srinu Katta Avatar asked Jul 23 '15 23:07

Srinu Katta


People also ask

What is compaction in HBase?

Instead, HBase will try to combine HFiles to reduce the maximum number of disk seeks needed for a read. This process is called compaction. Compactions choose some files from a single store in a region and combine them.

What is store file in HBase?

When something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size, it is flushed to disk into a store file (everything is also written immediately to a log file for durability). The store files (or HFiles) created on disk are immutable.

What is minor compaction?

Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.

Which will remove expired cells from HBase files?

HBase Major compaction. Whereas, a process of combining the StoreFiles of regions into a single StoreFile, is what we call HBase Major Compaction. Also, it deletes remove and expired versions.


2 Answers

Store File and HFile are synonyms, equivocally used to define the same concept.

When something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size, it is flushed to disk into a store file (everything is also written immediately to a log file for durability). The store files (or HFiles) created on disk are immutable. Sometimes the store files are merged together, this is done by a process called compaction.

For more information with statistics, see here. Happy Learning

like image 166
Ramzy Avatar answered Sep 27 '22 16:09

Ramzy


When the MemStore reaches a given size (hbase.hregion.memstore.flush.size), it flushes its contents to a StoreFile. The number of StoreFiles in a Store increases over time. Compaction is an operation which reduces the number of StoreFiles in a Store, by merging them together, in order to increase performance on read operations. Compactions can be resource-intensive to perform, and can either help or hinder performance depending on many factors.

Compactions fall into two categories: minor and major.

like image 45
Sujit Fulse Avatar answered Sep 27 '22 17:09

Sujit Fulse