Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hbase filters on multiple colummnFamily and qualifierRange returns 0 rows

Tags:

hbase

Filter list on single columnFamily works but on multiple columnFamily returns 0 rows. Problem statement is same as How to apply several QualifierFilter to a row in HBase

but i can not use SingleColumnValueFilter as column qualifier is a timestamp. so my filter looks like :

    val master_filter_list = new FilterList()

    val outer_fl_A = new FilterList()
    val cf_filter = new FamilyFilter (CompareOp.EQUAL, new BinaryComparator("ac".getBytes))
    val qualifier_range = new ColumnRangeFilter(Bytes.toBytes(fromDate.getMillis), true, Bytes.toBytes(toDate.getMillis), true)
    val ac_fl = new ValueFilter(comparison_operator, new BinaryComparator(Bytes.toBytes(value.toString.toInt)))
    outer_fl_A.addFilter(cf_filter)
    outer_fl_A.addFilter(qualifier_range)
    outer_fl_A.addFilter(ac_fl)
master_filter_list.addFilter(outer_fl_A)

    val outer_fl_B = new FilterList()
    val cf_filter = new FamilyFilter (CompareOp.EQUAL, new BinaryComparator("t".getBytes))
    val qualifier_range = new ColumnRangeFilter(Bytes.toBytes(fromDate.getMillis), true, Bytes.toBytes(toDate.getMillis), true)
    val ts_fl = new ValueFilter(comparison_operator, new BinaryComparator(value.toString.getBytes))
    outer_fl_B.addFilter(cf_filter)
    outer_fl_B.addFilter(qualifier_range)
    outer_fl_B.addFilter(ts_fl)
    master_filter_list.addFilter(outer_fl_B)

What would be the right way to get only the rows from the table which do have outer_fl_A AND outer_fl_B ?

like image 252
Abhi Avatar asked Feb 02 '16 13:02

Abhi


2 Answers

If outer_fl_A , outer_fl_B is different column families, or same, if you want values eligible for one of your filters, you should use OR to add filters to scan.

like image 90
halil Avatar answered Sep 28 '22 12:09

halil


This is very hard thing to implement with HBase. The root of your problem is that compound filter(list) predicates are evaluated at the KV level, not on a ROW level.

So a query like

give me all rows that have (values in) ColFam1 AND (values in) ColFam2 and also return ColFam3 in the results

is impossible to solve with the standard filtersets provided in the HBase distribution. Remember FilterLists do a MUST_PASS_ALL evaluation by default so when the scanner gets to evaluate a KV like ColFam1:qualifX somevalue it will ask the question 'is the CF equal to 'ColFam1' AND equal to 'ColFam2'? Which off course is never true. When you switch to MUST_PASS_ONE your results will unintendedly also include rows that have ColFam1 but NOT ColFam2 or vice versa or both.

So don't think SQL-like/row-based where you can say:

the row must have col1=A AND col2=B

HBase ColumnRangeFilter and (Mulitple)ColumnPrefixFilter can help you out in some use cases but they all work on the qualifier level only

like image 37
DataHacker Avatar answered Sep 28 '22 11:09

DataHacker