Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do we sort faster using unix sort?

Tags:

unix

sorting

We are sorting a 5GB file with 37 fields and sort it with 5 keys. The big file is composed of 1000 files of 5MB each.

After 190 minutes it still hasn't finished.

I am wondering if there are other methods to speed up the sorting. We choose unix sort because we don't want it to use up all the memory, so any memory based approach is not okay.

What is the advantage of sorting each files independently, and then use -m option to merge sort it?

like image 328
lamwaiman1988 Avatar asked Aug 16 '11 06:08

lamwaiman1988


People also ask

How do I sort large files in UNIX?

To list all files and sort them by size, use the -S option. By default, it displays output in descending order (biggest to smallest in size). You can output the file sizes in human-readable format by adding the -h option as shown. And to sort in reverse order, add the -r flag as follows.

How does Unix sort work?

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input.


1 Answers

Buffer it in memory using -S. For example, to use (up to) 50% of your memory as a sorting buffer do:

sort -S 50% file 

Note that modern Unix sort can sort in parallel. My experience is that it automatically uses as many cores as possible. You can set it directly using --parallel. To sort using 4 threads:

sort --parallel=4 file 

So all in all, you should put everything into one file and execute something like:

sort -S 50% --parallel=4 file 
like image 72
Malcolm Avatar answered Oct 02 '22 17:10

Malcolm