Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scan with filter using HBase shell

Tags:

nosql

hbase

Does anybody know how to scan records based on some scan filter i.e.:

column:something = "somevalue"

Something like this, but from HBase shell?

like image 803
Gandalf StormCrow Avatar asked Aug 31 '11 11:08

Gandalf StormCrow


People also ask

What is filter in HBase?

It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value. PrefixFilter: takes a single argument, a prefix of a row key. It returns only those key-values present in a row that start with the specified row prefix.

How do I use HBase shell?

To access the HBase shell, you have to navigate to the HBase home folder. You can start the HBase interactive shell using “hbase shell” command as shown below. If you have successfully installed HBase in your system, then it gives you the HBase shell prompt as shown below.

What is the role of filters in Apache HBase?

This filter is used to filter cells based on value. A wrapper filter that filters an entire row if any of the Cell checks do not pass. This comparator is for use with SingleColumnValueFilter, for filtering based on the value of a given column.

What is scan in HBase?

Scaning using HBase Shell The scan command is used to view the data in HTable. Using the scan command, you can get the table data. Its syntax is as follows: scan '<table name>'


2 Answers

Try this. It's kind of ugly, but it works for me.

import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 't1', { COLUMNS => 'family:qualifier', FILTER =>     SingleColumnValueFilter.new         (Bytes.toBytes('family'),          Bytes.toBytes('qualifier'),          CompareFilter::CompareOp.valueOf('EQUAL'),          SubstringComparator.new('somevalue')) } 

The HBase shell will include whatever you have in ~/.irbrc, so you can put something like this in there (I'm no Ruby expert, improvements are welcome):

# imports like above def scan_substr(table,family,qualifier,substr,*cols)     scan table, { COLUMNS => cols, FILTER =>         SingleColumnValueFilter.new             (Bytes.toBytes(family), Bytes.toBytes(qualifier),              CompareFilter::CompareOp.valueOf('EQUAL'),              SubstringComparator.new(substr)) } end 

and then you can just say in the shell:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier' 
like image 113
bhavanki Avatar answered Sep 26 '22 08:09

bhavanki


scan 'test', {COLUMNS => ['F'],FILTER => \  "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"} 

More information can be found here. Note that multiple examples reside in the attached Filter Language.docx file.

like image 41
dape Avatar answered Sep 25 '22 08:09

dape