I was trying to unzip a zip file, stored in Hadoop file system, & store it back in hadoop file system. I tried following commands, but none of them worked.
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp
I get errors like gzip: stdin has more than one entry--rest ignored
, cat: Unable to write to output stream.
, Error: Could not find or load main class put
on terminal, when I run those commands. Any help?
Edit 1: I don't have access to UI. So, only command lines are allowed. Unzip/gzip utils are installed on my hadoop machine. I'm using Hadoop 2.4.0
version.
You cannot directly unzip the files in HDFS. You would have to uncompress the files on local and then place it on HDFS. You can use Hue which uses a shell action in oozie to automate this. The files are uncompressed in the local and then uploaded to HDFS.
Go to the folder where the tar files are present. This command will first read the data in hdfs and decompress it using gzip, finally storing the decompressed data into new directory with custom name.
You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.
To unzip a gzipped (or bzipped) file, I use the following
hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/
If the file sits on your local drive, then
zcat <infile> | hdfs dfs -put - /data/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With