As a developer, I've created HBase table for our project by importing data from existing MySQL table using sqoop job
. The problem is our data analyst team are familiar with MySQL syntax, implies they can query HIVE
table easily. For them, I need to expose HBase table in HIVE. I don't want to duplicate data by populating data again in HIVE. Also, duplicating data might have consistency issues in future.
Can I expose HBase table in HIVE without duplicating data? If yes, how do I do it? Also, if I insert/update/delete
data in my HBase table will updated data appear in HIVE without any issues?
Sometimes, our data analytic team create table and populate data in HIVE. Can I expose them to HBase? If yes, how?
HBase-Hive Integration:
Creating an external table
in hive for HBase table allows you to query HBase data o be queried in Hive without the need for duplicating data. You can just update or delete data from HBase table and you can view the modified table in Hive too.
Example:
Consider you have an hbase table with columns id
, name
and email
.
Sample external table command for hive:
CREATE EXTERNAL TABLE hivehbasetable(key INT, id INT, username STRING, password STRING, email STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:username,name:password,email:email") TBLPROPERTIES("hbase.table.name" = "hbasetable");
For more information on Hive-Hbase integration look here
Using Apache Phoenix
One quick solution would be to use apache phoenix layer over HBase tables. Apache Phoenix is an interface that enables OLTP SQL queries to be used over Hbase NoSql DB. This doesn't have any additional overhead, rather it produces a view of data present in HBase using SQL queries.
Refer these links for further details:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With