I've used scans over data stored in Accumulo before, and have gotten the whole result set back (whatever Range
I specified). The problem is, I would like to filter those on the server-side from Accumulo before the client receives them. I'm hoping someone has a simple code example of how this is done.
From my understanding, Filter
provides some (all?) of this functionality, but how is this used in practice using the API? I see an example using Filter on the shell client, from the Accumulo documentation here: http://accumulo.apache.org/user_manual_1.3-incubating/examples/filter.html
I couldn't find any code examples online of a simple way to filter a scan based on regular expressions over any of the data, although I'm thinking this should be something relatively easy to do.
The Filter
class lays the framework for the functionality you want. To create a custom filter, you need to extend Filter
and implement the accept(Key k, Value v)
method. If you are only looking to filter based on regular expressions, you can avoid writing your own filter by using RegExFilter
.
Using a RegExFilter
is straightforward. Here is an example:
//first connect to Accumulo
ZooKeeperInstance inst = new ZooKeeperInstance(instanceName, zooServers);
Connector connect = inst.getConnector(user, password);
//initialize a scanner
Scanner scan = connect.createScanner(myTableName, myAuthorizations);
//to use a filter, which is an iterator, you must create an IteratorSetting
//specifying which iterator class you are using
IteratorSetting iter = new IteratorSetting(15, "myFilter", RegExFilter.class);
//next set the regular expressions to match. Here, I want all key/value pairs in
//which the column family begins with "J"
String rowRegex = null;
String colfRegex = "J.*";
String colqRegex = null;
String valueRegex = null;
boolean orFields = false;
RegExFilter.setRegexs(iter, rowRegex, colfRegex, colqRegex, valueRegex, orFields);
//now add the iterator to the scanner, and you're all set
scan.addScanIterator(iter);
The first two parameters of the iteratorSetting
constructor (priority and name) are not relevant in this case. Once you've added the above code, iterating through the scanner will only return key/value pairs that match the regex parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With