Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter Scan on Accumulo using RegEx

Tags:

regex

accumulo

I've used scans over data stored in Accumulo before, and have gotten the whole result set back (whatever Range I specified). The problem is, I would like to filter those on the server-side from Accumulo before the client receives them. I'm hoping someone has a simple code example of how this is done.

From my understanding, Filter provides some (all?) of this functionality, but how is this used in practice using the API? I see an example using Filter on the shell client, from the Accumulo documentation here: http://accumulo.apache.org/user_manual_1.3-incubating/examples/filter.html

I couldn't find any code examples online of a simple way to filter a scan based on regular expressions over any of the data, although I'm thinking this should be something relatively easy to do.

like image 957
Jack Avatar asked Jan 14 '23 11:01

Jack


1 Answers

The Filter class lays the framework for the functionality you want. To create a custom filter, you need to extend Filter and implement the accept(Key k, Value v) method. If you are only looking to filter based on regular expressions, you can avoid writing your own filter by using RegExFilter.

Using a RegExFilter is straightforward. Here is an example:

//first connect to Accumulo
ZooKeeperInstance inst = new ZooKeeperInstance(instanceName, zooServers);
Connector connect = inst.getConnector(user, password);

//initialize a scanner
Scanner scan = connect.createScanner(myTableName, myAuthorizations);

//to use a filter, which is an iterator, you must create an IteratorSetting
//specifying which iterator class you are using
IteratorSetting iter = new IteratorSetting(15, "myFilter", RegExFilter.class);
//next set the regular expressions to match. Here, I want all key/value pairs in
//which the column family begins with "J"
String rowRegex = null;
String colfRegex = "J.*";
String colqRegex = null;
String valueRegex = null;
boolean orFields = false;
RegExFilter.setRegexs(iter, rowRegex, colfRegex, colqRegex, valueRegex, orFields);
//now add the iterator to the scanner, and you're all set
scan.addScanIterator(iter);

The first two parameters of the iteratorSetting constructor (priority and name) are not relevant in this case. Once you've added the above code, iterating through the scanner will only return key/value pairs that match the regex parameters.

like image 162
Liz Avatar answered Jan 31 '23 21:01

Liz