This seems a very basic question, but I am new to python, and after spending a long time trying to find a solution on my own, I thought it's time to ask some more advanced people!
So, I have a file (sample):
ENSMUSG00000098737 95734911 95734973 3 miRNA
ENSMUSG00000077677 101186764 101186867 4 snRNA
ENSMUSG00000092727 68990574 68990678 11 miRNA
ENSMUSG00000088009 83405631 83405764 14 snoRNA
ENSMUSG00000028255 145003817 145032776 3 protein_coding
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000098481 38086202 38086317 13 miRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
and I need to write a new file with all the same information, but sorted by the first column.
What I use so far is :
lines = open(my_file, 'r').readlines()
output = open("intermediate_alphabetical_order.txt", 'w')
for line in sorted(lines, key=itemgetter(0)):
output.write(line)
output.close()
It doesn't return me any error, but just writes the output file exactly as the input file.
I know it is certainly a very basic mistake, but it would be amazing if some of you could tell me what I'm doing wrong!
Thanks a lot!
I am having trouble with the way I open the file, so the answers concerning already opened arrays don't really help.
The problem you're having is that you're not turning each line into a list. When you read in the file, you're just getting the whole line as a string. You're then sorting by the first character of each line, and this is always the same character in your input, 'E'.
To just sort by the first column, you need to split the first block off and just read that section. So your key should be this:
for line in sorted(lines, key=lambda line: line.split()[0]):
split will turn your line into a list, and then the first column is taken from that list.
If your input file is tab-separated, you can also use the csv module.
import csv
from operator import itemgetter
reader = csv.reader(open("t.txt"), delimiter="\t")
for line in sorted(reader, key=itemgetter(0)):
print(line)
sorts by first column.
Change the number in
key=itemgetter(0)
for sorting by a different column.
Same idea as SuperBiasedMan, but I prefer this approach: if you want another way of sorting (for example: if first column matches, sort by second, then third, etc) it is more easily implemented
with open(my_file) as f:
lines = [line.split(' ') for line in f]
output = open("result.txt", 'w')
for line in sorted(lines):
output.write(' '.join(line), key=itemgetter(0))
output.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With