Does anybody know how to scan records based on some scan filter i.e.:
column:something = "somevalue"
Something like this, but from HBase shell?
It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value. PrefixFilter: takes a single argument, a prefix of a row key. It returns only those key-values present in a row that start with the specified row prefix.
To access the HBase shell, you have to navigate to the HBase home folder. You can start the HBase interactive shell using “hbase shell” command as shown below. If you have successfully installed HBase in your system, then it gives you the HBase shell prompt as shown below.
This filter is used to filter cells based on value. A wrapper filter that filters an entire row if any of the Cell checks do not pass. This comparator is for use with SingleColumnValueFilter, for filtering based on the value of a given column.
Scaning using HBase Shell The scan command is used to view the data in HTable. Using the scan command, you can get the table data. Its syntax is as follows: scan '<table name>'
Try this. It's kind of ugly, but it works for me.
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 't1', { COLUMNS => 'family:qualifier', FILTER => SingleColumnValueFilter.new (Bytes.toBytes('family'), Bytes.toBytes('qualifier'), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('somevalue')) }
The HBase shell will include whatever you have in ~/.irbrc, so you can put something like this in there (I'm no Ruby expert, improvements are welcome):
# imports like above def scan_substr(table,family,qualifier,substr,*cols) scan table, { COLUMNS => cols, FILTER => SingleColumnValueFilter.new (Bytes.toBytes(family), Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new(substr)) } end
and then you can just say in the shell:
scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'
scan 'test', {COLUMNS => ['F'],FILTER => \ "(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ (SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}
More information can be found here. Note that multiple examples reside in the attached Filter Language.docx
file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With