Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unzip file in hadoop?

Tags:

hadoop

I was trying to unzip a zip file, stored in Hadoop file system, & store it back in hadoop file system. I tried following commands, but none of them worked.

hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp

I get errors like gzip: stdin has more than one entry--rest ignored, cat: Unable to write to output stream., Error: Could not find or load main class put on terminal, when I run those commands. Any help?

Edit 1: I don't have access to UI. So, only command lines are allowed. Unzip/gzip utils are installed on my hadoop machine. I'm using Hadoop 2.4.0 version.

like image 990
Abhishek Avatar asked Mar 17 '15 06:03

Abhishek


People also ask

How do I unzip a file in Hadoop?

You cannot directly unzip the files in HDFS. You would have to uncompress the files on local and then place it on HDFS. You can use Hue which uses a shell action in oozie to automate this. The files are uncompressed in the local and then uploaded to HDFS.

How Unzip GZ file in Hadoop?

Go to the folder where the tar files are present. This command will first read the data in hdfs and decompress it using gzip, finally storing the decompressed data into new directory with custom name.

How do I open a Hadoop file?

You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.


1 Answers

To unzip a gzipped (or bzipped) file, I use the following

hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/ 

If the file sits on your local drive, then

zcat <infile> | hdfs dfs -put - /data/
like image 63
Jon Avatar answered Sep 17 '22 18:09

Jon