What is the difference between Store file and HFile??
I have basic idea about compaction i.e. store files are merged together to reduce seeks from the disk.
Is it correct?? Can someone explain more about Compaction like the exact process and how it works?
Instead, HBase will try to combine HFiles to reduce the maximum number of disk seeks needed for a read. This process is called compaction. Compactions choose some files from a single store in a region and combine them.
When something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size, it is flushed to disk into a store file (everything is also written immediately to a log file for durability). The store files (or HFiles) created on disk are immutable.
Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.
HBase Major compaction. Whereas, a process of combining the StoreFiles of regions into a single StoreFile, is what we call HBase Major Compaction. Also, it deletes remove and expired versions.
Store File and HFile are synonyms, equivocally used to define the same concept.
When something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size, it is flushed to disk into a store file (everything is also written immediately to a log file for durability). The store files (or HFiles) created on disk are immutable. Sometimes the store files are merged together, this is done by a process called compaction.
For more information with statistics, see here. Happy Learning
When the MemStore reaches a given size (hbase.hregion.memstore.flush.size), it flushes its contents to a StoreFile. The number of StoreFiles in a Store increases over time. Compaction is an operation which reduces the number of StoreFiles in a Store, by merging them together, in order to increase performance on read operations. Compactions can be resource-intensive to perform, and can either help or hinder performance depending on many factors.
Compactions fall into two categories: minor and major.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With