Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tables.
AFAIK there are 2 ways to connect to hbase tables
Directly connect hbase and create a DataFrame
from RDD
and execute SQL on top of that.
Im not going to re-invent the wheel please see How to read from hbase using spark
as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF
) and follow the sql approach.
Ex :
CREATE TABLE users(
userid int, name string, email string, notes string)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" =
”small:name,small:email,large:notes”);
How to do that please see as an example
I would prefer approach 1.
Hope that helps...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With