Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is HBase compaction-queue-size at all?

Any one knows what regionserver queue size is meant?

By doc's definition:

9.2.5. hbase.regionserver.compactionQueueSize Size of the compaction queue. This is the number of stores in the region that have been targeted for compaction.

It is the number of Store(or store files? I have heard two version of it) of regionserver need to be major compacted.

I have a job writing data in a hotspot style using sequential key(non distributed). and I saw inside the metric history discovering that at a time it happened a compaction-queue-size = 4. That's theoretically impossible since I have only one Store to write(sequential key) at any time.

Then I dig into the log ,found there is any hint about queue size > 0: Every major compaction say "This selection was in queue for 0sec"

013-11-26 12:28:00,778 INFO [regionserver60020-smallCompactions-1385440028938] regionserver.HStore: Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into md5....(size=607.8 M), total size for store is 645.8 M. This selection was in queue for 0sec, and took 39sec to execute.

Just more confusing is : Isn't multi-thread enabled at earlier version and just allocate each compaction job to a thread ,by this reason why there exists compaction queue ?

Too bad that there's no detail explanation in hbase doc.

like image 766
WeiChing 林煒清 Avatar asked Nov 27 '13 00:11

WeiChing 林煒清


1 Answers

I don't fully understand your question. But let me attempt to answer it to the best of my abilities.

First let's talk about some terminology for HBase. Source

Table       (HBase table)
  Region      (Regions for the table)
    Store       (Store per ColumnFamily for each Region for the table)
      MemStore    (MemStore for each Store for each Region for the table)
      StoreFile   (StoreFiles for each Store for each Region for the table)
        Block       (Blocks within a StoreFile within a Store for each Region for the table)

A Region in HBase is defined as the Rows between two row key's. If you have more than one ColumnFamily in your Table, you will get one Store per ColumnFamily per Region. Every Store will have a MemStore and 0 or more StoreFiles

StoreFiles are created when the MemStore is flushed. Every so often, a background thread will trigger a compaction to keep the number of files in check. There are two types of compactions: major and minor. When a Store is targeted for a minor compaction, it will also pick up some adjacent StoreFiles and rewrites them as one. A minor compaction will not remove deleted/expired data. If a minor compaction picks up all StoreFiles in a Store, it's promoted to a major compaction. In a major compaction, all StoreFiles of a Store are rewritten as one StoreFile.

Ok... so what is a Compaction Queue? It is the number of Stores in a RegionServer that have been targeted for compaction. Similarly a Flush Queue is the number of MemStores that are awaiting flush.

As to the question of why there is a queue when you can do it asynchronously, I have no idea. This would be a great question to ask on the HBase mailing list. It tends to have faster response times.

EDIT: The compaction queue is there to not take up 100% of the resources of a RegionServer.

like image 87
Pradeep Gollakota Avatar answered Sep 21 '22 08:09

Pradeep Gollakota