Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SparkSQL on HBase Tables

Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tables.

like image 858
user6608138 Avatar asked Sep 16 '16 11:09

user6608138


1 Answers

AFAIK there are 2 ways to connect to hbase tables

- Directly connect to Hbase :

Directly connect hbase and create a DataFrame from RDD and execute SQL on top of that. Im not going to re-invent the wheel please see How to read from hbase using spark as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF) and follow the sql approach.

- Register table as hive external table with hbase storage handler and you can use hive on spark from hivecontext. It is also easy way.

Ex : 
CREATE TABLE users(
userid int, name string, email string, notes string)
STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
"hbase.columns.mapping" = 
”small:name,small:email,large:notes”);

How to do that please see as an example

I would prefer approach 1.

Hope that helps...

like image 163
Ram Ghadiyaram Avatar answered Sep 27 '22 16:09

Ram Ghadiyaram