Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase row key design for monotonically increasing keys

Tags:

nosql

row

hbase

I've an HBase table where I'm writing the row keys like:

<prefix>~1
<prefix>~2
<prefix>~3
...
<prefix>~9
<prefix>~10

The scan on the HBase shell gives an output:

<prefix>~1
<prefix>~10
<prefix>~2
<prefix>~3
...
<prefix>~9

How should a row key be designed so that the row with key <prefix>~10 comes last? I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

like image 681
Mayank Avatar asked Jul 22 '13 16:07

Mayank


3 Answers

How should a row key be designed so that the row with key ~10 comes last?

You see the scan output in this way because rowkeys in HBase are kept sorted lexicographically irrespective of the insertion order. This means that they are sorted based on their string representations. Remember that rowkeys in HBase are treated as an array of bytes having a string representation. The lowest order rowkey appears first in a table. That's why 10 appears before 2 and so on. See the sections Rows on this page to know more about this.

When you left pad the integers with zeros their natural ordering is kept intact while sorting lexicographically and that's why you see the scan order same as the order in which you had inserted the data. To do that you can design your rowkeys as suggested by @shutty.

I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

There are some general guidelines to be followed in order to devise a good design :

  • Keep the rowkey as small as possible.
  • Avoid using monotonically increasing rowkeys, such as timestamp etc. This is a poor shecma design and leads to RegionServer hotspotting. If you can't avoid that use someway, like hashing or salting to avoid hotspotting.
  • Avoid using Strings as rowkeys if possible. String representation of a number takes more bytes as compared to its integer or long representation. For example : A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
  • Use some mechanism, like hashing, in order to get uniform distribution of rows in case your regions are not evenly loaded. You could also create pre-splitted tables to achieve this.

See this link for more on rowkey design.

HTH

like image 146
Tariq Avatar answered Oct 26 '22 14:10

Tariq


HBase stores rowkeys in lexicographical order, so you can try to use this schema with fixed-length rowrey:

<prefix>~0001
<prefix>~0002
<prefix>~0003
...
<prefix>~0009
<prefix>~0010

Keep in mind that you also should use random prefixes to avoid region hot-spotting (when a single region accepts most of the writes, while the other regions are idle).

like image 39
shutty Avatar answered Oct 26 '22 12:10

shutty


monotonically increasing keys isnt a good schema for hbase. you can read more here: http://hbase.apache.org/book/rowkey.design.html

there also a link there to OpenTSDB that solve this problem.

like image 40
Udy Avatar answered Oct 26 '22 14:10

Udy