Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to export a table from HBase

Tags:

hbase

I am unable to export a table from HBase into HDFS. Below is the error trace. It is quite of big size. Are there any other ways to export it?

I used below command to export. I increase rpc timeout but still job failed.

sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path

15/05/05 08:50:27 INFO mapreduce.Job:  map 0% reduce 0%
15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED
Error: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
        at java.lang.Thread.run(Thread.java:745)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
        ... 13 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
        at java.lang.Thread.run(Thread.java:745)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:30328)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
        ... 17 more
like image 620
Jithen Avatar asked Nov 10 '22 14:11

Jithen


1 Answers

I'd suggest to look at the code and do a phase wise export.

If table is really large, Here are some tips which you could try by seeing the code of Export command you can adjust cache size, apply scan filter

please see below Export usage from hbase

  • Export before release 1.5
  • ExportUtils after release 2.0

please see usage command : which gives you more options.

With my experience cachesize (not batch size = number of columns at time)and or
custom filter condition should work for you. For ex : if your key starts like 0_ where 0 is region name first export those rows by specifying the filter and then next region data... so on. below is the ExportFilter snippet through which you can understand how it works..

  private static Filter getExportFilter(String[] args) { 
138     Filter exportFilter = null; 
139     String filterCriteria = (args.length > 5) ? args[5]: null; 
140     if (filterCriteria == null) return null; 
141     if (filterCriteria.startsWith("^")) { 
142       String regexPattern = filterCriteria.substring(1, filterCriteria.length()); 
143       exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern)); 
144     } else { 
145       exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria)); 
146     } 
147     return exportFilter; 
148   } 

/* 
151    * @param errorMsg Error message.  Can be null. 
152    */ 
153   private static void usage(final String errorMsg) { 
154     if (errorMsg != null && errorMsg.length() > 0) { 
155       System.err.println("ERROR: " + errorMsg); 
156     } 
157     System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " + 
158       "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n"); 
159     System.err.println("  Note: -D properties will be applied to the conf used. "); 
160     System.err.println("  For example: "); 
161     System.err.println("   -D mapreduce.output.fileoutputformat.compress=true"); 
162     System.err.println("   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec"); 
163     System.err.println("   -D mapreduce.output.fileoutputformat.compress.type=BLOCK"); 
164     System.err.println("  Additionally, the following SCAN properties can be specified"); 
165     System.err.println("  to control/limit what is exported.."); 
166     System.err.println("   -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>"); 
167     System.err.println("   -D " + RAW_SCAN + "=true"); 
168     System.err.println("   -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>"); 
169     System.err.println("   -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>"); 
170     System.err.println("   -D " + JOB_NAME_CONF_KEY 
171         + "=jobName - use the specified mapreduce job name for the export"); 
172     System.err.println("For performance consider the following properties:\n" 
173         + "   -Dhbase.client.scanner.caching=100\n" 
174         + "   -Dmapreduce.map.speculative=false\n" 
175         + "   -Dmapreduce.reduce.speculative=false"); 
176     System.err.println("For tables with very wide rows consider setting the batch size as below:\n" 
177         + "   -D" + EXPORT_BATCHING + "=10"); 
178   } 
like image 87
Ram Ghadiyaram Avatar answered Jan 04 '23 01:01

Ram Ghadiyaram