Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter using HBase REST API

Tags:

rest

curl

hbase

Does anyone know anything about the HBase REST API? Im currently writing a program which inserts and reads from HBase using curl commands. When trying to read I use the curl get command, e.g.

curl -X GET 'http://server:9090/test/Row-1/Action:ActionType/' -h 'Accept:application/json'

This returns the column Action:ActionType from Row-1. If I want to do the equivalent of a WHERE clause using the GET command I am stuck however. Im not sure its even possible? If I want to find all records where Action:ActionType =1 for example. Help is appreciated!

like image 426
AlanDev1989 Avatar asked Mar 30 '17 15:03

AlanDev1989


People also ask

What is filter in HBase?

It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value. PrefixFilter: takes a single argument, a prefix of a row key. It returns only those key-values present in a row that start with the specified row prefix.

What is the role of filters in Apache HBase?

This filter is used for selecting only those keys with columns that matches a particular prefix. Filter to support scan multiple row key ranges. A binary comparator which lexicographically compares against the specified byte array using Bytes.

What is HBase rest server?

Apache HBase provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of Apache Hadoop and can scale to billions of rows and millions of columns. One of the capabilities of Apache HBase is a REST server previously called Stargate.


1 Answers

You can do this by using a filter (here a SingleColumnValueFilter) in your CURL request.

First, create a XML file (myscanner.xml) describing your scan. Here we want to filter according to a qualifier value, with EQUAL operator) :

<Scanner batch="10">
    <filter>
        {
            "type": "SingleColumnValueFilter",
            "op": "EQUAL",
            "family": "<FAMILY_BASE64>",
            "qualifier": "<QUALIFIER_BASE64>",
            "latestVersion": true,
            "comparator": {
                "type": "BinaryComparator",
                "value": "<SEARCHED_VALUE_BASE64>"
            }
        }
    </filter>
</Scanner>

You should replace <FAMILY_BASE64>, <QUALIFIER_BASE64> and <SEARCHED_VALUE_BASE64> with your own values (values must be converted to base64, you can do echo -en ${FAMILY} | base64.

Then, submit a CURL request to HBase REST API with this XML file as data :

curl -vi -X PUT \
    -H "Content-Type:text/xml" \
    -d @myscanner.xml \
    "http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/"

This request should return a Scanner object, like :

[...]
Location: http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/149123344543470bea57a

Then use the given scanner to iterate through results (request multiple times to iterate) :

curl -vi -X GET \
    -H "Accept: text/xml" \
    "http://${HOST}:${REST_API_PORT}/${TABLE_NAME}/scanner/149123344543470bea57a"

You can also accept "application/json" instead of XML. Notice that the results are base64 encoded.

Sources :

HBase REST Filter ( SingleColumnValueFilter )

A list of filters you can use : https://gist.github.com/stelcheck/3979381

Cloudera documentation about HBase REST API : https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hbase_rest_api.html

like image 172
norbjd Avatar answered Oct 31 '22 11:10

norbjd