I got the following error when loading data from Impala to Vertica with Sqoop.
Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server. at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:233) at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84) ... 10 more Caused by: java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server. at com.vertica.jdbc.SStatement.processBatchResults(Unknown Source) at com.vertica.jdbc.SPreparedStatement.executeBatch(Unknown Source) at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:231)
And I was running the following command:
sudo -u impala sqoop export -Dsqoop.export.records.per.statement=xxx --driver com.vertica.jdbc.Driver --connect jdbc:vertica://host:5433/db --username name --password pw --table table --export-dir /some/dir -m 1 --input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --batch
This error was not raised every time. I had several successful tests loading over 2 million rows of data. So I guess there might be some bad data that contains special characters in the rejected rows. This is very annoying because when this error raised, mapreduce job would rollback and retry. In this case, there would be lots of duplicate data in the target table.
Does anyone have idea if there is any sqoop export parameter that can be set to deal with special characters or if there is any way to skip the bad data, which means to disable rollback? Thanks!
This may not be just special characters. If you try to stuff 'abc
' into a numeric field, for example, that row would get rejected. Even though you get this error, I believe it not until after the load and all data should be committed that could be committed (but I would verify that). If you isolate the "missing" rows you might be able to figure out what is wrong with the data or the field definition.
Common things to look for:
NULL
values into NOT NULL
fieldsVARCHAR
octets as equivalent. VARCHAR(x)
represents octets, but a UTF-8 character can have multiple octets. In the driver, the batch inserts are being replaced with a COPY FROM STDIN
statement. You might be able to find the statement in query_requests
although I'm not sure it will help.
Sqoop doesn't really give you much opportunity to investigate this further (as far as I am aware, I checked the generic JDBC Loader). One could look at the return array for executeBatch()
and tie this to your execution batch. Maybe modify the generic JDBC loader?
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With