I am trying to delete all data from HBase table, which has a timestamp older than a specified timestamp. This contains all the column families and rows.
Is there a way this can be done using shell as well as Java API?
HBase has no concept of range delete markers. This means that if you need to delete multiple cells, you need to place delete marker for every cell, which means you'll have to scan each row, either on the client side or server side. This means that you have two options:
Scan and delete: This is a clean and the easiest option. Since you said that you need to delete all column families older than a particular timestamp, the scan and delete operation can be optimized greatly by using server side filtering to read only the first key of each row.
Scan scan = new Scan();
scan.setTimeRange(0, STOP_TS); // STOP_TS: The timestamp in question
// Crucial optimization: Make sure you process multiple rows together
scan.setCaching(1000);
// Crucial optimization: Retrieve only row keys
FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
new FirstKeyOnlyFilter(), new KeyOnlyFilter());
scan.setFilter(filters);
ResultScanner scanner = table.getScanner(scan);
List<Delete> deletes = new ArrayList<>(1000);
Result [] rr;
do {
// We set caching to 1000 above
// make full use of it and get next 1000 rows in one go
rr = scanner.next(1000);
if (rr.length > 0) {
for (Result r: rr) {
Delete delete = new Delete(r.getRow(), STOP_TS);
deletes.add(delete);
}
table.delete(deletes);
deletes.clear();
}
} while(rr.length > 0);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With