Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort CSV by column name

I need to sort a CSV table by the column name. The data I get in CSV has various number of columns, but the column name is static. This is the data I have:

X,Blue,Black,Red,Green,Total
Thomas,2,0,0,0,2
Andy,0,1,0,0,1
Luise,0,0,2,1,3
Mark,1,0,0,1,2
Ronda,0,1,0,1,2

Pretty print:

  X     Blue    Black   Red Green  Total
Thomas   2       0       0    0     2
Andy     0       1       0    0     1
Luise    0       0       2    1     3
Mark     1       0       0    1     2
Ronda    0       1       0    1     2

Normally I could just sort by row[5] but if I get a table without any Greens, or a table with additional column - Yellow, it will no longer work.

Therefore the question is, how can I sort this CSV file by the column named Total?

Can only use core python modules, no pandas. :(

like image 810
That guy Avatar asked Dec 19 '17 12:12

That guy


People also ask

How do I sort a column by CSV in Python?

To sort CSV by multiple columns, use the sort_values() method. Sorting by multiple columns means if one of the columns has repeated values, then the sort order depends on the 2nd column mentioned under sort_values() method.


1 Answers

If you need to sort using a column name, it would be best to read your file using a Python csv.DictReader() object as follows:

import csv

with open('input.csv', 'r', newline='') as f_input:
    csv_input = csv.DictReader(f_input)
    data = sorted(csv_input, key=lambda row: (row['Total'], row['X']))

with open('output.csv', 'w', newline='') as f_output:    
    csv_output = csv.DictWriter(f_output, fieldnames=csv_input.fieldnames)
    csv_output.writeheader()
    csv_output.writerows(data)

This would then allow you to sort based on the Total column regardless of where it is. The csv.DictReader() reads each row as a dictionary, using the first row as a header. The header values can then be used as dictionary keys to access items rather than using positional values with a normal csv.reader(). So your first data row would be read as:

{'Total': '1', 'Blue': '0', 'Black': '1', 'Green': '0', 'X': 'Andy', 'Red': '0'}

csv_input can then be passed directly to sorted() to create a sorted list of row dictionaries. The key is the total field and then also the X field for the cases where the total is the same.

This would then write a sorted output.csv as follows:

X,Blue,Black,Red,Green,Total
Andy,0,1,0,0,1
Mark,1,0,0,1,2
Ronda,0,1,0,1,2
Thomas,2,0,0,0,2
Luise,0,0,2,1,3
like image 193
Martin Evans Avatar answered Oct 10 '22 23:10

Martin Evans