Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create external table for hbase

Tags:

hive

hbase

I am able to create external tables in hive of HBase, now i have a requirement to create an external table which is having variable columns, which means the columns in HBase are not fixed for the particular table, the no of columns and can be created dynamically at the time of data insertion, what should be the approach for handling such kind of situation.

Summery : How to create external tables in hive when the no of columns are not fixed in HBase table.

Thanks in advance.

like image 900
Infinity Avatar asked May 23 '26 04:05

Infinity


2 Answers

  1. Create table in Hbase shell

    create 'hbase_2_hive_names', 'id', 'name', 'age'

  2. Load data into Hbase (the input file must be in HDFS)

    export HADOOP_CLASSPATH=$(/usr/local/hbase/bin/hbase classpath);$HADOOP_HOME/bin/hadoop jar /usr/local/hbase/hbase-0.94.1.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,id:id,name:fn,name:ln,age:age hbase_2_hive_names /var/data/samples/names.tsv

  3. Create external table in Hive shell

    CREATE EXTERNAL TABLE hbase_hive_names(hbid INT, id INT, fn STRING, ln STRING, age INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:fn,name:ln,age:age") TBLPROPERTIES("hbase.table.name" = "hbase_2_hive_names");

like image 70
Sanjay Subramanian Avatar answered May 25 '26 18:05

Sanjay Subramanian


Step1: Login to HBase Shell

hbase shell

Step2: Creating HBase table

hbase(main):001:0> create 'hbase_emp_table', [{NAME => 'per', COMPRESSION => 'SNAPPY'}, {NAME => 'prof', COMPRESSION => 'SNAPPY'} ]
Created table hbase_emp_table
Took 1.5417 seconds
=> Hbase::Table - hbase_emp_table

Step3: Describing the HBase table:

hbase(main):002:0> describe 'hbase_emp_table'
Table hbase_emp_table is ENABLED
hbase_emp_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'per', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =>
 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'prof', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =
> 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
 PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
Took 0.1846 seconds

Step4: Inserting data to HBase table

put 'hbase_emp_table','1','per:name','Ranga Reddy'
put 'hbase_emp_table','1','per:age','32'
put 'hbase_emp_table','1','prof:des','Senior Software Engineer'
put 'hbase_emp_table','1','prof:sal','50000'

put 'hbase_emp_table','2','per:name','Nishanth Reddy'
put 'hbase_emp_table','2','per:age','3'
put 'hbase_emp_table','2','prof:des','Software Engineer'
put 'hbase_emp_table','2','prof:sal','80000'

Step5: Checking the HBase table data

hbase(main):012:0> scan 'hbase_emp_table'
ROW                                             COLUMN+CELL
 1                                              column=per:age, timestamp=1606304606241, value=32
 1                                              column=per:name, timestamp=1606304606204, value=Ranga Reddy
 1                                              column=prof:des, timestamp=1606304606269, value=Senior Software Engineer
 1                                              column=prof:sal, timestamp=1606304606301, value=50000
 2                                              column=per:age, timestamp=1606304606362, value=3
 2                                              column=per:name, timestamp=1606304606338, value=Nishanth Reddy
 2                                              column=prof:des, timestamp=1606304606387, value=Software Engineer
 2                                              column=prof:sal, timestamp=1606304608374, value=80000
2 row(s)
Took 0.0513 seconds

Step6: Login to Hive shell using either hive or beeline

hive

Step7: Create a Hive table

CREATE EXTERNAL TABLE IF NOT EXISTS hive_emp_table(id INT, name STRING, age SMALLINT, designation STRING, salary BIGINT) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,per:name,per:age,prof:des,prof:sal") 
TBLPROPERTIES("hbase.table.name" = "hbase_emp_table");

Step8: Select the Hive table data

hive> select * from hive_emp_table;
INFO  : OK
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| hive_emp_table.id  | hive_emp_table.name  | hive_emp_table.age  | hive_emp_table.designation  | hive_emp_table.salary  |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| 1                  | Ranga Reddy          | 32                  | Senior Software Engineer    | 50000                  |
| 2                  | Nishanth Reddy       | 3                   | Software Engineer           | 80000                  |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
2 rows selected (17.401 seconds)
like image 29
Ranga Reddy Avatar answered May 25 '26 18:05

Ranga Reddy