How can I decompress and view few lines of a compressed file in hdfs. The below command displays the last few lines of the compressed data
hadoop fs -tail /myfolder/part-r-00024.gz
Is there a way I can use the -text command and pipe the output to tail command? I tried this but this doesn't work.
hadoop fs -text /myfolder/part-r-00024.gz > hadoop fs -tail /myfolder/
You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.
Solution. Zcat is a command line utility for viewing the contents of a compressed file without literally uncompressing it. It expands a compressed file to standard output allowing you to have a look at its contents. In addition, zcat is identical to running gunzip -c command.
Launch WinZip from your start menu or Desktop shortcut. Open the compressed file by clicking File > Open. If your system has the compressed file extension associated with WinZip program, just double-click on the file.
The following will show you the specified number of lines without decompressing the whole file:
hadoop fs -cat /hdfs_location/part-00000.gz | zcat | head -n 20
The following will page the file, also without first decompressing the whole of it:
hadoop fs -cat /hdfs_location/part-00000.gz | zmore
Try the following, should work as long as your file isn't too big (since the whole thing will be decompressed):
hadoop fs -text /myfolder/part-r-00024.gz | tail
I ended up writing a pig script.
A = LOAD '/myfolder/part-r-00024.gz' USING PigStorage('\t');
B = LIMIT A 10;
DUMP B;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With