Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mass delete multiple rows in hbase?

Tags:

I have the following rows with these keys in hbase table "mytable"

user_1 user_2 user_3 ... user_9999999 

I want to use the Hbase shell to delete rows from:

user_500 to user_900

I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?

I see here:

https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java

I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?

   Table ht = TEST_UTIL.getConnection().getTable("my_table");     long noOfDeletedRows = 0L;     Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =       new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {       ServerRpcController controller = new ServerRpcController();       BlockingRpcCallback<BulkDeleteResponse> rpcCallback =         new BlockingRpcCallback<BulkDeleteResponse>();        public BulkDeleteResponse call(BulkDeleteService service) throws IOException {         Builder builder = BulkDeleteRequest.newBuilder();         builder.setScan(ProtobufUtil.toScan(scan));         builder.setDeleteType(deleteType);         builder.setRowBatchSize(rowBatchSize);         if (timeStamp != null) {           builder.setTimestamp(timeStamp);         }         service.delete(controller, builder.build(), rpcCallback);         return rpcCallback.get();       }     };     Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan         .getStartRow(), scan.getStopRow(), callable);     for (BulkDeleteResponse response : result.values()) {       noOfDeletedRows += response.getRowsDeleted();     }     ht.close(); 

If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.

like image 844
Rolando Avatar asked Sep 16 '15 00:09

Rolando


People also ask

What is the use of get () method in HBase?

You can retrieve data from the HBase table using the get() method of the HTable class. This method extracts a cell from a given row. It requires a Get class object as parameter.

How does HBase delete work?

When a Delete command is issued through the HBase client, no data is actually deleted. Instead a tombstone marker is set, making the deleted cells effectively invisible. User Scans and Gets automatically filter deleted cells until they get removed.

Is used to delete table in HBase?

You can delete a table using the deleteTable() method in the HBaseAdmin class.


2 Answers

Do you really want to do it in shell because there are various other better ways. One way is using the native java API

  • Construct an array list of deletes
  • pass this array list to Table.delete method

Method 1: if you already know the range of keys.

public void massDelete(byte[] tableName) throws IOException {     HTable table=(HTable)hbasePool.getTable(tableName);      String tablePrefix = "user_";     int startRange = 500;     int endRange = 999;      List<Delete> listOfBatchDelete = new ArrayList<Delete>();      for(int i=startRange;i<=endRange;i++){         String key = tablePrefix+i;          Delete d=new Delete(Bytes.toBytes(key));         listOfBatchDelete.add(d);       }      try {         table.delete(listOfBatchDelete);     } finally {         if (hbasePool != null && table != null) {             hbasePool.putTable(table);         }     } } 

Method 2: If you want to do a batch delete on the basis of a scan result.

public bulkDelete(final HTable table) throws IOException {     Scan s=new Scan();     List<Delete> listOfBatchDelete = new ArrayList<Delete>();     //add your filters to the scanner     s.addFilter();     ResultScanner scanner=table.getScanner(s);     for (Result rr : scanner) {         Delete d=new Delete(rr.getRow());         listOfBatchDelete.add(d);     }     try {         table.delete(listOfBatchDelete);     } catch (Exception e) {         LOGGER.log(e);      } } 

Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase. CoProcessors have many inbuilt issues if you need I can provide a detailed description to you. Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.

Modified code to support batch operation.

int batchSize = 50; int batchCounter=0; for(int i=startRange;i<=endRange;i++){  String key = tablePrefix+i; Delete d=new Delete(Bytes.toBytes(key)); listOfBatchDelete.add(d);   batchCounter++;  if(batchCounter==batchSize){     try {         table.delete(listOfBatchDelete);         listOfBatchDelete.clear();         batchCounter=0;     } }} 

Creating HBase conf and getting table instance.

Configuration hConf = HBaseConfiguration.create(conf); hConf.set("hbase.zookeeper.quorum", "Zookeeper IP"); hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);  HTable hTable = new HTable(hConf, tableName); 
like image 192
Vikram Singh Chandel Avatar answered Sep 19 '22 09:09

Vikram Singh Chandel


If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach

1.First create a List objects with these rowkeys

for (int rowKey = 1; rowKey <= 10; rowKey++) {     deleteList.add(new Delete(Bytes.toBytes(rowKey + ""))); } 

2.Then get the Table object by using HBase Connection

Table table = connection.getTable(TableName.valueOf(tableName)); 

3.Once you have table object call delete() by passing the list

table.delete(deleteList); 

The complete code will look like below

Configuration config = HBaseConfiguration.create(); config.addResource(new Path("/etc/hbase/conf/hbase-site.xml")); config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));  String tableName = "users";  Connection connection = ConnectionFactory.createConnection(config); Table table = connection.getTable(TableName.valueOf(tableName));  List<Delete> deleteList = new ArrayList<Delete>();  for (int rowKey = 500; rowKey <= 900; rowKey++) {     deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey))); }  table.delete(deleteList); 
like image 22
Prasad Khode Avatar answered Sep 22 '22 09:09

Prasad Khode