Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get HBase Row Keys in Range without Retrieving Data?

Tags:

hbase

Is there a way to retrieve the row keys in a given range without actually retrieving the columns/CFs associated with that row key?

For clarification: In my example, our table's row keys are stock ticker names (e.g. GOOG), and in our web app we'd like to populate an autocomplete widget using just the row keys we have in the database. Obviously, if we retrieve all the data (instead of only the stock names) for all the stocks between G and H when a user types 'G', we'll be unnecessarily straining our system. Any ideas?

like image 320
Foxichu Avatar asked Apr 14 '11 19:04

Foxichu


3 Answers

According to the official documentation, you can optimally retrieve only the row keys using a combination of two filters: the KeyOnlyFilter and the FirstKeyOnlyFilter. (I think the "FirstKeyOnlyFilter" will return the key only once, even with large, complex rows.) If you only want keys in a given range, you can add that range to the scanner.

Here is some example code:

FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
            new FirstKeyOnlyFilter(),
            new KeyOnlyFilter());
Scan s = new Scan(filters);
// in order to limit the scan to a range
s.setStartRow(startRowKey);  // first key in range
s.setStopRow(stopRowKey);    // key value after the last key in the range

Source: https://hbase.apache.org/book.html#perf.hbase.client.rowkeyonly

like image 103
Steve DeNeefe Avatar answered Nov 12 '22 03:11

Steve DeNeefe


take a look at the filters (http://hbase.apache.org/book/client.filter.html), especially KeyOnlyFilter. the description of the filter (by http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html) is

A filter that will only return the key component of each KV (the value will be rewritten as empty).

in order to restrict the keys on a specific range use the Scan(rowStart, rowEnd) constructor.

like image 28
divadpoc Avatar answered Nov 12 '22 03:11

divadpoc


I would create a column family called 'empty:', and store empty values for all the rows. Now, you can just just request to load the column 'empty:'. This is not ideal, but it is better than loading columns families with lot of data.

like image 1
dminer Avatar answered Nov 12 '22 04:11

dminer