I'm trying to use ^A as the separator between Key and Value in my reduce output files. I found that the config setting "mapred.textoutputformat.separator" is what I want and this correctly switches the separator to ",":
conf.set("mapred.textoutputformat.separator", ",");
But it can't handle the ^A character:
conf.set("mapred.textoutputformat.separator", "\u0001");
throws this error:
ERROR security.UserGroupInformation: PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 94; Character reference "&#
I found this ticket https://issues.apache.org/jira/browse/HADOOP-7542 and see they tried to fix this but reverted the patch due to XML1.1 concerns.
SO I'm wondering if anyone has had success setting the separator to ^A (seems pretty common), using an easy work around. Or if I should just settle and use tab separator.
Thanks!
I'm running Hadoop 0.20.2-cdh3u5 on CentOS 6.2
Looking around it looks like there are maybe three options that i've found for solving this problem:
The possible solutions as detailed in the link above are:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With