I have a HBase table, and I need to get the result from several ranges. For example, I may need get data from different ranges like row 1-6, 100-150,..... I know that for each scan, I can define the start row and stop row. But if I have 6 ranges, I need to do scan 6 times. Is there any way that I can get the result from multiple ranges just from one scan or from one RPC? My HBase version is 0.98.
Filter to support scan multiple row key ranges. It can construct the row key ranges from the passed list which can be accessed by each region server.
HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are:
using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient.
Both of them can't utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges
are quite big (e.g. millions), join is a proper solution though it is slow.
However, there are
cases that user wants to specify a small number of ranges to scan (e.g. <1000 ranges). Both
solutions can't provide satisfactory performance in such case.
MultiRowRangeFilter is to support such usec ase (scan multiple row key ranges), which can construct the row key ranges from user
specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient.
package chengchen;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.MultiRowRangeFilter;
import org.apache.hadoop.hbase.filter.MultiRowRangeFilter.RowKeyRange;
import org.apache.hadoop.hbase.util.Bytes;
public class MultiRowRangeFilterTest {
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new Exception("Table name not specified.");
}
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, args[0]);
TimeCounter executeTimer = new TimeCounter();
executeTimer.begin();
executeTimer.enter();
Scan scan = new Scan();
List<RowKeyRange> ranges = new ArrayList<RowKeyRange>();
ranges.add(new RowKeyRange(Bytes.toBytes("001"), Bytes.toBytes("002")));
ranges.add(new RowKeyRange(Bytes.toBytes("003"), Bytes.toBytes("004")));
ranges.add(new RowKeyRange(Bytes.toBytes("005"), Bytes.toBytes("006")));
Filter filter = new MultiRowRangeFilter(ranges);
scan.setFilter(filter);
int count = 0;
ResultScanner scanner = table.getScanner(scan);
Result r = scanner.next();
while (r != null) {
count++;
r = scanner.next();
}
System.out
.println("++ Scanning finished with count : " + count + " ++");
scanner.close();
}
}
Please see this test case for implementing in java
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With