I have some sorted, gzipped files in a directory. How do I combine some of them into another sorted, gzipped file? Right now I'm using explicit fifos. Is there a way to do it in bash without? I'm a bit of a bash noob, so please excuse my lack of style.
#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
mkfifo $f.raw
gzcat $f > $f.raw &
# sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw
I'm looking to convert this into something like...
sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.
...but don't know how. Do I need a loop building the parameters to string? Is there some sort of magic shortcut for this? Maybe map gzcat $@
?
NOTE: Each of the files is in excess of 10GB (and 100GB unzipped). I have a 2TB drive, so this isn't really a problem. Also, this program MUST run in O(n) or it becomes unfeasible.
You can combine eval
and 'process substitution' with Bash. Assuming the basic file names don't contain spaces (which, given that you use $@
instead of "$@"
is probably the case), then something like:
cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd $file)"
done
eval $cmd | gzip -c9 > outputfile.gz
You can also use bash -c "$cmd"
instead of eval $cmd
on the last line. If there are spaces in the file names, you have to work a bit harder. This works if the names don't contain single quotes:
cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd '$file')"
done
eval $cmd | gzip -c9 > outputfile.gz
With single quotes in the file names too, you have to work a lot harder.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With