Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unzip all gz files in all subdirectories in the terminal

Is there a way to unzip all gz files in the folder containing the zipfiles. When zip files are in subdirectories. A query for

find -type f -name "*.gz"

Gives results like this:

./datasets/auto/auto.csv.gz
./datasets/prnn_synth/prnn_synth.csv.gz
./datasets/sleep/sleep.csv.gz
./datasets/mfeat-zernike/mfeat-zernike.csv.gz
./datasets/sonar/sonar.csv.gz
./datasets/wine-quality-white/wine-quality-white.csv.gz
./datasets/ring/ring.csv.gz
./datasets/diabetes/diabetes.csv.g
like image 967
Peter Mølgaard Pallesen Avatar asked Jan 03 '23 19:01

Peter Mølgaard Pallesen


1 Answers

If you want, for each of those, to launch "gzip -d" on them:

cd theparentdir && gzip -d $(find ./ -type f -name '*.gz')

and then, to gzip them back:

cd theparentdir && gzip $(find ./ -type f -name '*.csv')

This will however choke in many cases

  • if filenames have some special characters (spaces, tabs, newline, etc) in them
  • other similar cases
  • or if there are TOO MANY files to be put after the gzip command!

A solution would be instead, if you have GNU find, to do :

find ... -print0 | xarsg -0 gzip -d # for the gunzip one, but still choke on files with "newline" in them

Another (arguably better?) solution, if you have GNU find at your disposal:

cd theparentdir && find ./ -type f -name '*.gz' -exec gzip -d '{}' '+'

and to re-zip all csv in that parentdir & all subdirs:

cd theparentdir && find ./ -type f -name '*.csv' -exec gzip '{}' '+'

"+" tells GNU find to try to put as many found files as it can on each gzip invocation (instead of doing 1 gzip incocation per file, very very ressource intensive and very innefficient and slow), similar to xargs, but with some benefits (1 command only, no pipe needed)

like image 160
Olivier Dulac Avatar answered Jan 08 '23 07:01

Olivier Dulac