Now that SpyGlass is no longer being maintained, what is the recommended way to access HBase using Scala/Scalding? A similar question was asked in 2013, but most of the suggested links are either dead or to defunct projects. The only link that seems useful is to Apache Flink. Is that considered the best option nowadays? Are people still recommending SpyGlass for new projects even though it isn't been maintained? Performance (massively parallel) and testability are priorities.
With SHC, Spark can execute batch jobs to read/write data from/into Phoenix tables. Phoenix can also read/write data from/into HBase tables created by SHC.
Create a Dataproc cluster, installing Apache HBase and Apache ZooKeeper on the cluster. Create an HBase table using the HBase shell running on the master node of the Dataproc cluster. Use Cloud Shell to submit a Java or PySpark Spark job to the Dataproc service that writes data to, then reads data from, the HBase table.
According to my experiences in writing data Cassandra using Flink Cassandra connector, I think the best way is to use Flink built-in connectors. Since Flink 1.4.3
you can use HBase Flink connector. See here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With