I have the following input csv file:
"aaa","1","xxx" "ccc, Inc.","6100","yyy" "bbb","609","zzz"
I wish to sort by the second column as numbers, I tried
sort --field-separator=',' --key=2n
the problem is that since all values are quoted, they don't get sorted correctly by -n (numeric) option. is there a solution?
sort -k1 -n -t, filename should do the trick. -k1 sorts by column 1. -n sorts numerically instead of lexicographically (so "11" will not come before "2,3..."). -t, sets the delimiter (what separates values in your file) to , since your file is comma-separated.
To sort CSV by multiple columns, use the sort_values() method. Sorting by multiple columns means if one of the columns has repeated values, then the sort order depends on the 2nd column mentioned under sort_values() method.
Another way of sorting CSV files is by using the sorted() method on the CSV module object. However, it can only sort CSV files based on only one column. Below are various which depict various ways to sort a CSV dataset. Example 1: Sorting the dataset in ascending order on the basis of Age.
There isn't going to be a really simple solution. If you make some reasonable assumptions, then you could consider:
sed 's/","/^A/g' input.csv |
sort -t'^A' -k 2n |
sed 's/^A/","/g'
This replaces the ","
sequence with Control-A (shown as ^A
in the code), then uses that as the field delimiter in sort
(the numeric sort on column 2), and then replace the Control-A characters with ","
again.
If you use bash
, you can use the ANSI C quoting mechanism $'\1'
to embed the control characters visibly into the script; you just have to finish the single-quoted string before the escape, and restart it afterwards:
sed 's/","/'$'\1''/g' input.csv |
sort -t$'\1' -k 2n |
sed 's/'$'\1''/","/g'
Or play with double quotes instead of single quotes, but that gets messy because of the double quotes that you are replacing. But you can simply type the characters verbatim and editors like vim
will be happy to show them to you.
A little trick, which uses a double quote as the separator:
sort --field-separator='"' --key=4 -n
For a quoted csv
use a language that has a proper csv
parser. Here is an example using perl
.
perl -MText::ParseWords -lne '
chomp;
push @line, [ parse_line(",", 0, $_) ];
}{
@line = sort { $a->[1] <=> $b->[1] } @line;
for (@line) {
local $" = qw(",");
print qq("@$_");
}
' file
Output:
"aaa","1","xxx"
"bbb","609","zzz"
"ccc, Inc.","6100","yyy"
Explanation:
chomp
function. END
block, sort the array of array on second column and assign it to the original array of array. ","
and we print it with preceding and trailing "
to create the lines in original format. Dropping your example into a file called sort2.txt I found the following to work well.sort -t'"' -k4n sort2.txt
Using sort with the following commands (thank you for the refinements Jonathan)
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With