I got some pig
generated files with part-r-00000.deflate
extension. I know this is a compressed file. How do I generate a normal file in a readable format. When I used hadoop fs -text
, I cannot get plaintext output. The output is still binary. How can I fix this problem?
In computing, Deflate (stylized as DEFLATE) is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).
You might be using a quite old Hadoop version (e.g: 0.20.0) in which fs -text can't inflate the compressed file.
As a workaround you may try this one-liner (based on this answer):
hadoop fs -text file.deflate | perl -MCompress::Zlib -e 'undef $/; print uncompress(<>)'
you can decompress on the fly by using this command
hdfs dfs -text file.deflate | hdfs dfs -put - uncompressed_destination_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With