Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

View gzipped file content in hadoop

Tags:

hadoop

How can I decompress and view few lines of a compressed file in hdfs. The below command displays the last few lines of the compressed data

hadoop fs -tail /myfolder/part-r-00024.gz

Is there a way I can use the -text command and pipe the output to tail command? I tried this but this doesn't work.

hadoop fs -text /myfolder/part-r-00024.gz > hadoop fs -tail /myfolder/
like image 339
nobody Avatar asked Aug 12 '15 14:08

nobody


People also ask

How do I view the contents of a file in hadoop?

You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.

How do I view a GZ file in hadoop?

Solution. Zcat is a command line utility for viewing the contents of a compressed file without literally uncompressing it. It expands a compressed file to standard output allowing you to have a look at its contents. In addition, zcat is identical to running gunzip -c command.

How do I view GZ file?

Launch WinZip from your start menu or Desktop shortcut. Open the compressed file by clicking File > Open. If your system has the compressed file extension associated with WinZip program, just double-click on the file.


3 Answers

The following will show you the specified number of lines without decompressing the whole file:

hadoop fs -cat /hdfs_location/part-00000.gz | zcat | head -n 20

The following will page the file, also without first decompressing the whole of it:

hadoop fs -cat /hdfs_location/part-00000.gz | zmore
like image 135
leo9r Avatar answered Nov 29 '22 23:11

leo9r


Try the following, should work as long as your file isn't too big (since the whole thing will be decompressed):

hadoop fs -text /myfolder/part-r-00024.gz | tail
like image 22
mattinbits Avatar answered Nov 30 '22 00:11

mattinbits


I ended up writing a pig script.

A = LOAD '/myfolder/part-r-00024.gz' USING PigStorage('\t');
B = LIMIT A 10;
DUMP B;
like image 33
nobody Avatar answered Nov 30 '22 01:11

nobody