Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase - rowkey basics

Tags:

nosql

hbase

NOTE : Its a few hours ago that I have begun HBase and I come from an RDBMS background :P

I have a RDBMS-like table CUSTOMERS having the following columns:

  1. CUSTOMER_ID STRING
  2. CUSTOMER_NAME STRING
  3. CUSTOMER_EMAIL STRING
  4. CUSTOMER_ADDRESS STRING
  5. CUSTOMER_MOBILE STRING

I have thought of the following HBase equivalent :

table : CUSTOMERS rowkey : CUSTOMER_ID

column family : CUSTOMER_INFO

columns : NAME EMAIL ADDRESS MOBILE

From whatever I have read, a primary key in an RDBMS table is roughly similar to a HBase table's rowkey. Accordingly, I want to keep CUSTOMER_ID as the rowkey.

My questions are dumb and straightforward :

  1. Irrespective of whether I use a shell command or the HBaseAdmin java class, how do I define the rowkey? I didn't find anything to do it either in the shell or in the HBaseAdmin class(some thing like HBaseAdmin.createSuperKey(...))
  2. Given a HBase table, how to determine the rowkey details i.e which are the values used as rowkey?
  3. I understand that rowkey design is a critical thing. Suppose a customer id is receives values like CUST_12345, CUST_34434 and so on, how will HBase use the rowkey to decide in which region do particular rows reside(assuming that region concept is similar to DB horizontal partitioning)?

***Edited to add sample code snippet

I'm simply trying to create one row for the customer table using 'put' in the shell. I did this :

hbase(main):011:0> put  'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:NAME','Omkar Joshi'
0 row(s) in 0.1030 seconds

hbase(main):012:0> scan 'CUSTOMERS'
ROW                              COLUMN+CELL
 CUSTID12345                     column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0500 seconds

hbase(main):013:0> put  'CUSTOMERS', 'CUSTID614', 'CUSTOMER_INFO:NAME','Prachi Shah', 'CUSTOMER_INFO:EMAIL','[email protected]'

ERROR: wrong number of arguments (6 for 5)

Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 't1' at
row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 't1', 'r1', 'c1', 'value', ts1


hbase(main):014:0> put  'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:EMAIL','[email protected]'
0 row(s) in 0.0160 seconds

hbase(main):015:0>
hbase(main):016:0* scan 'CUSTOMERS'
ROW                              COLUMN+CELL
 CUSTID12345                     column=CUSTOMER_INFO:EMAIL, timestamp=1365600369284, [email protected]
 CUSTID12345                     column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0230 seconds

As put takes max. 5 arguments, I was not able to figure out how to insert the entire row in one put command. This is resulting in incremental versions of the same row which isn't required and I'm not sure if CUSTOMER_ID is being used as a rowkey ! Thanks and regards !

like image 939
Kaliyug Antagonist Avatar asked Apr 10 '13 05:04

Kaliyug Antagonist


1 Answers

  1. You don't, the key (and any other column for that matter) is a bytearray you can put whatever you want there- even encapsulate sub-entities

  2. Not sure I understand that - each value is stored as key+column family + column qualifier + datetime + value - so the key is there.

  3. HBase figures out which region a record will go to as it goes. When regions gets too big it repartitions. Also from time to time when there's too much junk HBase performs compactions to rearrage the files. You can control that when you pre-partition yourself, which is somehting you should definitely think about in the future. However, since it seems you are just starting out with HBase you can start with HBase taking care of that. Once you understand your usage patterns and data better you will probably want to go over that again.

You can read/hear a little about HBase schema design here and here

like image 173
Arnon Rotem-Gal-Oz Avatar answered Dec 29 '22 04:12

Arnon Rotem-Gal-Oz