Is there a way to retrieve the row keys in a given range without actually retrieving the columns/CFs associated with that row key?
For clarification: In my example, our table's row keys are stock ticker names (e.g. GOOG), and in our web app we'd like to populate an autocomplete widget using just the row keys we have in the database. Obviously, if we retrieve all the data (instead of only the stock names) for all the stocks between G and H when a user types 'G', we'll be unnecessarily straining our system. Any ideas?
According to the official documentation, you can optimally retrieve only the row keys using a combination of two filters: the KeyOnlyFilter and the FirstKeyOnlyFilter. (I think the "FirstKeyOnlyFilter" will return the key only once, even with large, complex rows.) If you only want keys in a given range, you can add that range to the scanner.
Here is some example code:
FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
new FirstKeyOnlyFilter(),
new KeyOnlyFilter());
Scan s = new Scan(filters);
// in order to limit the scan to a range
s.setStartRow(startRowKey); // first key in range
s.setStopRow(stopRowKey); // key value after the last key in the range
Source: https://hbase.apache.org/book.html#perf.hbase.client.rowkeyonly
take a look at the filters (http://hbase.apache.org/book/client.filter.html), especially KeyOnlyFilter. the description of the filter (by http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html) is
A filter that will only return the key component of each KV (the value will be rewritten as empty).
in order to restrict the keys on a specific range use the Scan(rowStart, rowEnd) constructor.
I would create a column family called 'empty:', and store empty values for all the rows. Now, you can just just request to load the column 'empty:'. This is not ideal, but it is better than loading columns families with lot of data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With