Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a .deflate file in hadoop

Tags:

hadoop

I got some pig generated files with part-r-00000.deflate extension. I know this is a compressed file. How do I generate a normal file in a readable format. When I used hadoop fs -text, I cannot get plaintext output. The output is still binary. How can I fix this problem?

like image 670
Himateja Madala Avatar asked Sep 12 '13 08:09

Himateja Madala


People also ask

What is a .deflate file?

In computing, Deflate (stylized as DEFLATE) is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).


2 Answers

You might be using a quite old Hadoop version (e.g: 0.20.0) in which fs -text can't inflate the compressed file.

As a workaround you may try this one-liner (based on this answer):

hadoop fs -text file.deflate | perl -MCompress::Zlib -e 'undef $/; print uncompress(<>)'
like image 111
Lorand Bendig Avatar answered Oct 28 '22 04:10

Lorand Bendig


you can decompress on the fly by using this command

hdfs dfs -text file.deflate | hdfs dfs -put - uncompressed_destination_file

like image 31
guignol Avatar answered Oct 28 '22 04:10

guignol