Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash sort quoted csv files by numeric key

Tags:

bash

sorting

csv

I have the following input csv file:

"aaa","1","xxx"
"ccc, Inc.","6100","yyy"
"bbb","609","zzz"

I wish to sort by the second column as numbers, I tried

sort --field-separator=',' --key=2n

the problem is that since all values are quoted, they don't get sorted correctly by -n (numeric) option. is there a solution?

like image 953
user121196 Avatar asked Jul 11 '14 00:07

user121196


People also ask

How do I sort a CSV file in bash?

sort -k1 -n -t, filename should do the trick. -k1 sorts by column 1. -n sorts numerically instead of lexicographically (so "11" will not come before "2,3..."). -t, sets the delimiter (what separates values in your file) to , since your file is comma-separated.

How do I sort a CSV file by value?

To sort CSV by multiple columns, use the sort_values() method. Sorting by multiple columns means if one of the columns has repeated values, then the sort order depends on the 2nd column mentioned under sort_values() method.

Can you sort a CSV file?

Another way of sorting CSV files is by using the sorted() method on the CSV module object. However, it can only sort CSV files based on only one column. Below are various which depict various ways to sort a CSV dataset. Example 1: Sorting the dataset in ascending order on the basis of Age.


4 Answers

There isn't going to be a really simple solution. If you make some reasonable assumptions, then you could consider:

sed 's/","/^A/g' input.csv |
sort -t'^A' -k 2n |
sed 's/^A/","/g'

This replaces the "," sequence with Control-A (shown as ^A in the code), then uses that as the field delimiter in sort (the numeric sort on column 2), and then replace the Control-A characters with "," again.

If you use bash, you can use the ANSI C quoting mechanism $'\1' to embed the control characters visibly into the script; you just have to finish the single-quoted string before the escape, and restart it afterwards:

sed 's/","/'$'\1''/g' input.csv |
sort -t$'\1' -k 2n |
sed 's/'$'\1''/","/g'

Or play with double quotes instead of single quotes, but that gets messy because of the double quotes that you are replacing. But you can simply type the characters verbatim and editors like vim will be happy to show them to you.

like image 26
Jonathan Leffler Avatar answered Oct 11 '22 15:10

Jonathan Leffler


A little trick, which uses a double quote as the separator:

sort --field-separator='"' --key=4 -n
like image 171
nicky_zs Avatar answered Oct 11 '22 17:10

nicky_zs


For a quoted csv use a language that has a proper csv parser. Here is an example using perl.

perl -MText::ParseWords -lne '
    chomp; 
    push @line, [ parse_line(",", 0, $_) ];
}{ 
    @line = sort { $a->[1] <=> $b->[1] } @line;
    for (@line) {
        local $" = qw(",");
        print qq("@$_");
    }
' file

Output:

"aaa","1","xxx"
"bbb","609","zzz"
"ccc, Inc.","6100","yyy"

Explanation:

  • Remove the new line from input using chomp function.
  • Using a code module Text::Parsewords parse the quoted line and store it in an array of array without the quotes.
  • In the END block, sort the array of array on second column and assign it to the original array of array.
  • For every item in our array of array, we set the output list separator to "," and we print it with preceding and trailing " to create the lines in original format.
like image 40
jaypal singh Avatar answered Oct 11 '22 16:10

jaypal singh


Dropping your example into a file called sort2.txt I found the following to work well.
sort -t'"' -k4n sort2.txt Using sort with the following commands (thank you for the refinements Jonathan)

  • -t[optional single character separator other than tab. Defined within the single quotes]'"'.
  • -k4 choose the value in the fourth key.(k)delimited by ", and on the 4th key value
  • -n numeric sort
  • file name avoid the use of chaining as unnecessary
  • Hope this helps!

    like image 36
    Bill Avatar answered Oct 11 '22 15:10

    Bill