I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.
Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?
Note: I have ruled out Perl because of the file size potential. It would be a last resort.
Use the -k option to sort on a certain column. For example, use " -k 2 " to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data ( +2 for the third, etc.).
How to sort by number. To sort by number pass the -n option to sort . This will sort from lowest number to highest number and write the result to standard output. Suppose a file exists with a list of items of clothing that has a number at the start of the line and needs to be sorted numerically.
Another way to sort multiple files simultaneously is to pipe the find command output to sort and use the --files0-from= option in the sort command. Specify the -print0 option in find to end file name with the NUL character and ensure the program properly reads the file list.
What are sort and uniq? Ordering and manipulating data in Linux-based text files can be carried out using the sort and uniq utilities. The sort command orders a list of items both alphabetically and numerically, whereas the uniq command removes adjacent duplicate lines in a list.
Take care though:
If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:
sort -k 3,3 -k 2,2 < inputfile
Not this: sort -k 3 -k 2 < inputfile
which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).
-k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2
(default end of line)
The -k option is what you want.
-k 1.4,1.5n -k 1.14,1.15n
Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.
The second key would be characters 14-15 in the first field also.
(edit)
Example (all I have is DOS/cygwin handy):
dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r
for the data:
12/10/2008 01:10 PM 1,564,990 outfile.txt
Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With