Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge sorted files without using a temporary file?

I'm trying to merge many sorted files in a UNIX/Linux script with sort -m, and I noticed that sort first writes the result to a temporary file, then copies it to destination. My understanding of -m was that it assumes the files are sorted, so using a temporary file is completely unnecessary, and it wastes both hard disk space and CPU cycles (I'm using sort in a pipeline which gets stuck waiting for sort to output anything.) Is there a way to tell sort to not use temporary files when merging sorted files? Or a better version which doesn't?

The exact CL looks like:

$ sort -m -s -t '_' -k 1,1n -k 2,2n <(gunzip <file_1) [...] <(gunzip <file_n) | gzip >output

I'm using sort from GNU coreutils 5.97.

like image 698
Matei David Avatar asked Jul 06 '11 15:07

Matei David


2 Answers

Check out these options from man sort, they might let you minimize the amount of space needed for merging.

--batch-size=NMERGE  

merge at most NMERGE inputs at once; for more use temp files

--compress-program=PROG 

compress temporaries with PROG; decompress them with PROG -d

like image 98
Marcin Avatar answered Oct 18 '22 12:10

Marcin


Running with GNU coreutils 6.10, I'm not seeing that problem.

One thing about the command line that you're using is that the <(...) redirection writes the input to a temporary file before starting the command. Could that be the delay you are seeing?

I ran this command:

sort -m a b c d e f g h i j | more

and it did not create a temp file for the output. I piped the output into more so it would block and then looked in /proc to see what sort was doing. It had all of the input files opened, and the pipe to the more command, but that was it. No temporary file:

$ ls -l /proc/1308/fd
total 0
lrwx------ 1 brianb brianb 64 2014-06-24 18:50 0 -> /dev/pts/0
l-wx------ 1 brianb brianb 64 2014-06-24 18:50 1 -> pipe:[217016034]
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 10 -> /home/brianb/h
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 11 -> /home/brianb/i
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 12 -> /home/brianb/j
lrwx------ 1 brianb brianb 64 2014-06-24 18:50 2 -> /dev/pts/0
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 3 -> /home/brianb/a
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 4 -> /home/brianb/b
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 5 -> /home/brianb/c
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 6 -> /home/brianb/d
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 7 -> /home/brianb/e
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 8 -> /home/brianb/f
lr-x------ 1 brianb brianb 64 2014-06-24 18:50 9 -> /home/brianb/g
like image 42
Brian Beach Avatar answered Oct 18 '22 13:10

Brian Beach