I am unable to export a table from HBase into HDFS. Below is the error trace. It is quite of big size. Are there any other ways to export it?
I used below command to export. I increase rpc timeout but still job failed.
sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path
15/05/05 08:50:27 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED
Error: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
... 13 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:30328)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 17 more
If table is really large, Here are some tips which you could try by seeing the code of Export
command
you can adjust cache size, apply scan filter
please see below Export usage
from hbase
please see usage command : which gives you more options.
With my experience cachesize
(not batch size = number of columns at time)and or
custom filter condition should work for you.
For ex : if your key starts like 0_ where 0 is region name first export those rows by specifying the filter
and then next region data... so on. below is the ExportFilter snippet through which you can understand how it works..
private static Filter getExportFilter(String[] args) {
138 Filter exportFilter = null;
139 String filterCriteria = (args.length > 5) ? args[5]: null;
140 if (filterCriteria == null) return null;
141 if (filterCriteria.startsWith("^")) {
142 String regexPattern = filterCriteria.substring(1, filterCriteria.length());
143 exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern));
144 } else {
145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria));
146 }
147 return exportFilter;
148 }
/*
151 * @param errorMsg Error message. Can be null.
152 */
153 private static void usage(final String errorMsg) {
154 if (errorMsg != null && errorMsg.length() > 0) {
155 System.err.println("ERROR: " + errorMsg);
156 }
157 System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " +
158 "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n");
159 System.err.println(" Note: -D properties will be applied to the conf used. ");
160 System.err.println(" For example: ");
161 System.err.println(" -D mapreduce.output.fileoutputformat.compress=true");
162 System.err.println(" -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec");
163 System.err.println(" -D mapreduce.output.fileoutputformat.compress.type=BLOCK");
164 System.err.println(" Additionally, the following SCAN properties can be specified");
165 System.err.println(" to control/limit what is exported..");
166 System.err.println(" -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>");
167 System.err.println(" -D " + RAW_SCAN + "=true");
168 System.err.println(" -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>");
169 System.err.println(" -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>");
170 System.err.println(" -D " + JOB_NAME_CONF_KEY
171 + "=jobName - use the specified mapreduce job name for the export");
172 System.err.println("For performance consider the following properties:\n"
173 + " -Dhbase.client.scanner.caching=100\n"
174 + " -Dmapreduce.map.speculative=false\n"
175 + " -Dmapreduce.reduce.speculative=false");
176 System.err.println("For tables with very wide rows consider setting the batch size as below:\n"
177 + " -D" + EXPORT_BATCHING + "=10");
178 }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With