I have a csv file that contains 6 columns.
I want to sort it by col #2 and then by col #3.
My current code is creating a blank file:
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted1 = sorted(reader, key=lambda row: int(row[1]))
sorted2 = sorted(reader, key=lambda row: int(row[2]))
for row in sorted2:
writer.writerow(row)
What am I doing wrong?
The reason that your output file is empty is because
sorted2 = sorted(reader, key=lambda row: int(row[2]))
is trying to sort the data from reader
, but you've already read all the data in the previous sorting statement, so there's nothing left for the reader to read. However, you really don't want to re-sort the data from reader
, you want to re-sort the data in sorted1
, like this:
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted1 = sorted(reader, key=lambda row: int(row[1]))
sorted2 = sorted(sorted1, key=lambda row: int(row[2]))
for row in sorted2:
writer.writerow(row)
OTOH, there's no need to do the sorting in two passes. You can do it in a single pass by changing the key function.
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted2 = sorted(reader, key=lambda row: (int(row[1]), int(row[2])))
for row in sorted2:
writer.writerow(row)
That key function first compares items by their row[1]
values, and if those values are identical it then compares them by their row[2]
values. That may not give the ordering that you actually want. You may want to reverse the order of those tests:
key=lambda row: (int(row[2]), int(row[1]))
As Peter Wood mentions in the comments, Writer objects have a writerows
method that will write all the rows in one call. This is more efficient than writing the rows one by one in a for
loop.
BTW, there's no need to do this assignment:
_ = next(reader)
I guess it makes it clear that you're discarding the 1st row, but you could just write the call without performing an assignment:
next(reader)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With