Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting multiple keys with Unix sort

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

like image 391
Chris Kloberdanz Avatar asked Oct 13 '22 16:10

Chris Kloberdanz


People also ask

How do I sort multiple columns in Unix?

Use the -k option to sort on a certain column. For example, use " -k 2 " to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data ( +2 for the third, etc.).

How do I sort a list of numbers in Unix?

How to sort by number. To sort by number pass the -n option to sort . This will sort from lowest number to highest number and write the result to standard output. Suppose a file exists with a list of items of clothing that has a number at the start of the line and needs to be sorted numerically.

How do I sort multiple files in Linux?

Another way to sort multiple files simultaneously is to pipe the find command output to sort and use the --files0-from= option in the sort command. Specify the -print0 option in find to end file name with the NUL character and ensure the program properly reads the file list.

How do you sort Uniq?

What are sort and uniq? Ordering and manipulating data in Linux-based text files can be carried out using the sort and uniq utilities. The sort command orders a list of items both alphabetically and numerically, whereas the uniq command removes adjacent duplicate lines in a list.


2 Answers

Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)
like image 349
andras Avatar answered Oct 16 '22 18:10

andras


The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

like image 99
Clinton Pierce Avatar answered Oct 16 '22 20:10

Clinton Pierce