Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all data from HBase table according to time range?

I am trying to delete all data from HBase table, which has a timestamp older than a specified timestamp. This contains all the column families and rows.

Is there a way this can be done using shell as well as Java API?

like image 782
Alifiya Ali Avatar asked Sep 30 '16 09:09

Alifiya Ali


Video Answer


1 Answers

HBase has no concept of range delete markers. This means that if you need to delete multiple cells, you need to place delete marker for every cell, which means you'll have to scan each row, either on the client side or server side. This means that you have two options:

  1. BulkDeleteProtocol : This uses a coprocessor endpoint, which means that the complete operation will run on the server side. The link has an example of how to use it. If you do a web search, you can easily find how to enable a coprocessor endpoint in HBase.
  2. Scan and delete: This is a clean and the easiest option. Since you said that you need to delete all column families older than a particular timestamp, the scan and delete operation can be optimized greatly by using server side filtering to read only the first key of each row.

    Scan scan = new Scan();
    scan.setTimeRange(0, STOP_TS);  // STOP_TS: The timestamp in question
    // Crucial optimization: Make sure you process multiple rows together
    scan.setCaching(1000);
    // Crucial optimization: Retrieve only row keys
    FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
        new FirstKeyOnlyFilter(), new KeyOnlyFilter());
    scan.setFilter(filters);
    ResultScanner scanner = table.getScanner(scan);
    List<Delete> deletes = new ArrayList<>(1000);
    Result [] rr;
    do {
      // We set caching to 1000 above
      // make full use of it and get next 1000 rows in one go
      rr = scanner.next(1000);
      if (rr.length > 0) {
        for (Result r: rr) {
          Delete delete = new Delete(r.getRow(), STOP_TS);
          deletes.add(delete);
        }
        table.delete(deletes);
        deletes.clear();
      }
    } while(rr.length > 0);
    
like image 135
Ashu Pachauri Avatar answered Sep 30 '22 16:09

Ashu Pachauri