NOTE : Its a few hours ago that I have begun HBase and I come from an RDBMS background :P
I have a RDBMS-like table CUSTOMERS having the following columns:
I have thought of the following HBase equivalent :
table : CUSTOMERS rowkey : CUSTOMER_ID
column family : CUSTOMER_INFO
columns : NAME EMAIL ADDRESS MOBILE
From whatever I have read, a primary key in an RDBMS table is roughly similar to a HBase table's rowkey. Accordingly, I want to keep CUSTOMER_ID as the rowkey.
My questions are dumb and straightforward :
***Edited to add sample code snippet
I'm simply trying to create one row for the customer table using 'put' in the shell. I did this :
hbase(main):011:0> put 'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:NAME','Omkar Joshi'
0 row(s) in 0.1030 seconds
hbase(main):012:0> scan 'CUSTOMERS'
ROW COLUMN+CELL
CUSTID12345 column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0500 seconds
hbase(main):013:0> put 'CUSTOMERS', 'CUSTID614', 'CUSTOMER_INFO:NAME','Prachi Shah', 'CUSTOMER_INFO:EMAIL','[email protected]'
ERROR: wrong number of arguments (6 for 5)
Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 't1' at
row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase(main):014:0> put 'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:EMAIL','[email protected]'
0 row(s) in 0.0160 seconds
hbase(main):015:0>
hbase(main):016:0* scan 'CUSTOMERS'
ROW COLUMN+CELL
CUSTID12345 column=CUSTOMER_INFO:EMAIL, timestamp=1365600369284, [email protected]
CUSTID12345 column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0230 seconds
As put takes max. 5 arguments, I was not able to figure out how to insert the entire row in one put command. This is resulting in incremental versions of the same row which isn't required and I'm not sure if CUSTOMER_ID is being used as a rowkey ! Thanks and regards !
You don't, the key (and any other column for that matter) is a bytearray you can put whatever you want there- even encapsulate sub-entities
Not sure I understand that - each value is stored as key+column family + column qualifier + datetime + value - so the key is there.
HBase figures out which region a record will go to as it goes. When regions gets too big it repartitions. Also from time to time when there's too much junk HBase performs compactions to rearrage the files. You can control that when you pre-partition yourself, which is somehting you should definitely think about in the future. However, since it seems you are just starting out with HBase you can start with HBase taking care of that. Once you understand your usage patterns and data better you will probably want to go over that again.
You can read/hear a little about HBase schema design here and here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With