I am able to create external tables in hive of HBase, now i have a requirement to create an external table which is having variable columns, which means the columns in HBase are not fixed for the particular table, the no of columns and can be created dynamically at the time of data insertion, what should be the approach for handling such kind of situation.
Summery : How to create external tables in hive when the no of columns are not fixed in HBase table.
Thanks in advance.
Create table in Hbase shell
create 'hbase_2_hive_names', 'id', 'name', 'age'
Load data into Hbase (the input file must be in HDFS)
export HADOOP_CLASSPATH=$(/usr/local/hbase/bin/hbase classpath);$HADOOP_HOME/bin/hadoop jar /usr/local/hbase/hbase-0.94.1.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,id:id,name:fn,name:ln,age:age hbase_2_hive_names /var/data/samples/names.tsv
Create external table in Hive shell
CREATE EXTERNAL TABLE hbase_hive_names(hbid INT, id INT, fn STRING, ln STRING, age INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:fn,name:ln,age:age") TBLPROPERTIES("hbase.table.name" = "hbase_2_hive_names");
Step1: Login to HBase Shell
hbase shell
Step2: Creating HBase table
hbase(main):001:0> create 'hbase_emp_table', [{NAME => 'per', COMPRESSION => 'SNAPPY'}, {NAME => 'prof', COMPRESSION => 'SNAPPY'} ]
Created table hbase_emp_table
Took 1.5417 seconds
=> Hbase::Table - hbase_emp_table
Step3: Describing the HBase table:
hbase(main):002:0> describe 'hbase_emp_table'
Table hbase_emp_table is ENABLED
hbase_emp_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'per', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'prof', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =
> 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
Took 0.1846 seconds
Step4: Inserting data to HBase table
put 'hbase_emp_table','1','per:name','Ranga Reddy'
put 'hbase_emp_table','1','per:age','32'
put 'hbase_emp_table','1','prof:des','Senior Software Engineer'
put 'hbase_emp_table','1','prof:sal','50000'
put 'hbase_emp_table','2','per:name','Nishanth Reddy'
put 'hbase_emp_table','2','per:age','3'
put 'hbase_emp_table','2','prof:des','Software Engineer'
put 'hbase_emp_table','2','prof:sal','80000'
Step5: Checking the HBase table data
hbase(main):012:0> scan 'hbase_emp_table'
ROW COLUMN+CELL
1 column=per:age, timestamp=1606304606241, value=32
1 column=per:name, timestamp=1606304606204, value=Ranga Reddy
1 column=prof:des, timestamp=1606304606269, value=Senior Software Engineer
1 column=prof:sal, timestamp=1606304606301, value=50000
2 column=per:age, timestamp=1606304606362, value=3
2 column=per:name, timestamp=1606304606338, value=Nishanth Reddy
2 column=prof:des, timestamp=1606304606387, value=Software Engineer
2 column=prof:sal, timestamp=1606304608374, value=80000
2 row(s)
Took 0.0513 seconds
Step6: Login to Hive shell using either hive or beeline
hive
Step7: Create a Hive table
CREATE EXTERNAL TABLE IF NOT EXISTS hive_emp_table(id INT, name STRING, age SMALLINT, designation STRING, salary BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,per:name,per:age,prof:des,prof:sal")
TBLPROPERTIES("hbase.table.name" = "hbase_emp_table");
Step8: Select the Hive table data
hive> select * from hive_emp_table;
INFO : OK
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| hive_emp_table.id | hive_emp_table.name | hive_emp_table.age | hive_emp_table.designation | hive_emp_table.salary |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| 1 | Ranga Reddy | 32 | Senior Software Engineer | 50000 |
| 2 | Nishanth Reddy | 3 | Software Engineer | 80000 |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
2 rows selected (17.401 seconds)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With