Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining sorted files with fifos

I have some sorted, gzipped files in a directory. How do I combine some of them into another sorted, gzipped file? Right now I'm using explicit fifos. Is there a way to do it in bash without? I'm a bit of a bash noob, so please excuse my lack of style.

#!/bin/bash
# Invocation ./merge [files ... ]
# Turns an arbitrary set of sorted, gzipped files into a single sorted, gzipped file,
# printed to stdout. Redirect this script's output!
for f in $@
do
    mkfifo $f.raw
    gzcat $f > $f.raw &
    # sort -C $f.raw
done
sort -mu *.raw | gzip -c # prints to stdout.
rm -f *.raw

I'm looking to convert this into something like...

sort -mu <(gzcat $1) <(gzcat $2) <(gzcat $3) ... | gzip -9c # prints to stdout.

...but don't know how. Do I need a loop building the parameters to string? Is there some sort of magic shortcut for this? Maybe map gzcat $@?

NOTE: Each of the files is in excess of 10GB (and 100GB unzipped). I have a 2TB drive, so this isn't really a problem. Also, this program MUST run in O(n) or it becomes unfeasible.

like image 206
Clark Gaebel Avatar asked Jun 13 '11 04:06

Clark Gaebel


1 Answers

You can combine eval and 'process substitution' with Bash. Assuming the basic file names don't contain spaces (which, given that you use $@ instead of "$@" is probably the case), then something like:

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd $file)"
done
eval $cmd | gzip -c9 > outputfile.gz

You can also use bash -c "$cmd" instead of eval $cmd on the last line. If there are spaces in the file names, you have to work a bit harder. This works if the names don't contain single quotes:

cmd="sort -mu"
for file in "$@"
do cmd="$cmd <(gzip -cd '$file')"
done
eval $cmd | gzip -c9 > outputfile.gz

With single quotes in the file names too, you have to work a lot harder.

like image 177
Jonathan Leffler Avatar answered Sep 25 '22 14:09

Jonathan Leffler