I have a table in HBase named as "orders" it has column family 'o' and columns as {id,fname,lname,email} having row key as id. I am trying to get the value of fname and email only from hbase using spark. Currently what 'i am doing is given below
override def put(params: scala.collection.Map[String, Any]): Boolean = {
var sparkConfig = new SparkConf().setAppName("Connector")
var sc: SparkContext = new SparkContext(sparkConfig)
var hbaseConfig = HBaseConfiguration.create()
hbaseConfig.set("hbase.zookeeper.quorum", ZookeeperQourum)
hbaseConfig.set("hbase.zookeeper.property.clientPort", zookeeperPort)
hbaseConfig.set(TableInputFormat.INPUT_TABLE, schemdto.tableName);
hbaseConfig.set(TableInputFormat.SCAN_COLUMNS, "o:fname,o:email");
var hBaseRDD = sc.newAPIHadoopRDD(hbaseConfig, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
try {
hBaseRDD.map(tuple => tuple._2).map(result => result.raw())
.map(f => KeyValueToString(f)).saveAsTextFile(sink)
true
} catch {
case _: Exception => false
}
}
def KeyValueToString(keyValues: Array[KeyValue]): String = {
var it = keyValues.iterator
var res = new StringBuilder
while (it.hasNext) {
res.append( Bytes.toString(it.next.getValue()) + ",")
}
res.substring(0, res.length-1);
}
But nothing is returned and If I try to fetch only one column such as
hbaseConfig.set(TableInputFormat.SCAN_COLUMNS, "o:fname");
then it returns all the values of column fname
So my question is how to get multiple columns from hbase using spark
Any help will be appreciated.
List of columns to scan needs to be space-delimited, according to the documentation.
hbaseConfig.set(TableInputFormat.SCAN_COLUMNS, "o:fname o:email");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With