Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommended way to access HBase using Scala

Now that SpyGlass is no longer being maintained, what is the recommended way to access HBase using Scala/Scalding? A similar question was asked in 2013, but most of the suggested links are either dead or to defunct projects. The only link that seems useful is to Apache Flink. Is that considered the best option nowadays? Are people still recommending SpyGlass for new projects even though it isn't been maintained? Performance (massively parallel) and testability are priorities.

like image 904
Ellen Spertus Avatar asked May 18 '18 17:05

Ellen Spertus


People also ask

Can Spark read from HBase?

With SHC, Spark can execute batch jobs to read/write data from/into Phoenix tables. Phoenix can also read/write data from/into HBase tables created by SHC.

How do I connect to HBase with PySpark?

Create a Dataproc cluster, installing Apache HBase and Apache ZooKeeper on the cluster. Create an HBase table using the HBase shell running on the master node of the Dataproc cluster. Use Cloud Shell to submit a Java or PySpark Spark job to the Dataproc service that writes data to, then reads data from, the HBase table.


1 Answers

According to my experiences in writing data Cassandra using Flink Cassandra connector, I think the best way is to use Flink built-in connectors. Since Flink 1.4.3 you can use HBase Flink connector. See here

like image 169
Soheil Pourbafrani Avatar answered Oct 06 '22 17:10

Soheil Pourbafrani