I used LZO to compress reduce output. I tried this: Hadoop-LZO project of Kevin Weil and then used LzoCodec
class with my job:
TextOutputFormat.setOutputCompressorClass(job, LzoCodec.class);
Now compression works just fine.
My problem is that compression result is a .lzo_deflate
file which I just can't decompress.
Lzop utility doesn't seem to support that type of file.LzopCodec
is supposed to give a .lzo
file, but it did not work, however it's in th same package as LzoCodec
(org.apache.hadoop.io.compress
) which may refer to a compatibility issue, since I used the old API (0.19) to make compression works.
Answers to this question suggest Python solutions, however I need it in Java.
I'm using Hadoop 1.1.2 and Java 6.
.lzo_deflate
means an LZO stream without the usual header and trailer. So you would need to wrap the raw .lzo_deflate
stream with the header and trailer expected by lzop. Or at least the header, and then ignore errors from the missing trailer. You'll need to look at the header and trailer documentation.
The "deflate" in the name is an odd choice, but it refers to the gzip analogy, where the raw compressed data format without the gzip header and trailer is called deflate.
I came across the same issue. The reason it happened because I was not using the right codec. Please check your codec in job configuration.
job.getConfiguration().set("mapred.output.compression.codec","com.hadoop.compression.lzo.LzopCodec");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With