Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to sort a csv file by two columns in python?

Tags:

python

I have a csv file that contains 6 columns.

I want to sort it by col #2 and then by col #3.

My current code is creating a blank file:

import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
        writer = csv.writer(final, delimiter='\t')
        reader = csv.reader(f, delimiter=',')
        _ = next(reader)
        sorted1 = sorted(reader, key=lambda row: int(row[1]))
        sorted2 = sorted(reader, key=lambda row: int(row[2]))
        for row in sorted2:
            writer.writerow(row)

What am I doing wrong?

like image 527
Lala Avatar asked Jan 28 '23 23:01

Lala


1 Answers

The reason that your output file is empty is because

sorted2 = sorted(reader, key=lambda row: int(row[2]))

is trying to sort the data from reader, but you've already read all the data in the previous sorting statement, so there's nothing left for the reader to read. However, you really don't want to re-sort the data from reader, you want to re-sort the data in sorted1, like this:

import csv

with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
    writer = csv.writer(final, delimiter='\t')
    reader = csv.reader(f, delimiter=',')
    _ = next(reader)
    sorted1 = sorted(reader, key=lambda row: int(row[1]))
    sorted2 = sorted(sorted1, key=lambda row: int(row[2]))
    for row in sorted2:
        writer.writerow(row)

OTOH, there's no need to do the sorting in two passes. You can do it in a single pass by changing the key function.

import csv

with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
    writer = csv.writer(final, delimiter='\t')
    reader = csv.reader(f, delimiter=',')
    _ = next(reader)
    sorted2 = sorted(reader, key=lambda row: (int(row[1]), int(row[2])))        
    for row in sorted2:
        writer.writerow(row)

That key function first compares items by their row[1] values, and if those values are identical it then compares them by their row[2] values. That may not give the ordering that you actually want. You may want to reverse the order of those tests:

key=lambda row: (int(row[2]), int(row[1])) 

As Peter Wood mentions in the comments, Writer objects have a writerows method that will write all the rows in one call. This is more efficient than writing the rows one by one in a for loop.

BTW, there's no need to do this assignment:

_ = next(reader)

I guess it makes it clear that you're discarding the 1st row, but you could just write the call without performing an assignment:

next(reader)
like image 144
PM 2Ring Avatar answered Feb 09 '23 00:02

PM 2Ring