Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort a file by first (or second, or else) column in python

This seems a very basic question, but I am new to python, and after spending a long time trying to find a solution on my own, I thought it's time to ask some more advanced people!

So, I have a file (sample):

ENSMUSG00000098737  95734911    95734973    3   miRNA
ENSMUSG00000077677  101186764   101186867   4   snRNA
ENSMUSG00000092727  68990574    68990678    11  miRNA
ENSMUSG00000088009  83405631    83405764    14  snoRNA
ENSMUSG00000028255  145003817   145032776   3   protein_coding
ENSMUSG00000028255  145003817   145032776   3   processed_transcript
ENSMUSG00000028255  145003817   145032776   3   processed_transcript
ENSMUSG00000098481  38086202    38086317    13  miRNA
ENSMUSG00000097075  126971720   126976098   7   lincRNA
ENSMUSG00000097075  126971720   126976098   7   lincRNA

and I need to write a new file with all the same information, but sorted by the first column.

What I use so far is :

lines = open(my_file, 'r').readlines()
output = open("intermediate_alphabetical_order.txt", 'w')

for line in sorted(lines, key=itemgetter(0)):
    output.write(line)

output.close()

It doesn't return me any error, but just writes the output file exactly as the input file.

I know it is certainly a very basic mistake, but it would be amazing if some of you could tell me what I'm doing wrong!

Thanks a lot!

Edit

I am having trouble with the way I open the file, so the answers concerning already opened arrays don't really help.

like image 600
Tiana Avatar asked Dec 08 '15 14:12

Tiana


3 Answers

The problem you're having is that you're not turning each line into a list. When you read in the file, you're just getting the whole line as a string. You're then sorting by the first character of each line, and this is always the same character in your input, 'E'.

To just sort by the first column, you need to split the first block off and just read that section. So your key should be this:

for line in sorted(lines, key=lambda line: line.split()[0]):

split will turn your line into a list, and then the first column is taken from that list.

like image 147
SuperBiasedMan Avatar answered Nov 13 '22 16:11

SuperBiasedMan


If your input file is tab-separated, you can also use the csv module.

import csv
from operator import itemgetter
reader = csv.reader(open("t.txt"), delimiter="\t")

for line in sorted(reader, key=itemgetter(0)):
    print(line)

sorts by first column.

Change the number in

key=itemgetter(0)

for sorting by a different column.

like image 6
Revan Avatar answered Nov 13 '22 17:11

Revan


Same idea as SuperBiasedMan, but I prefer this approach: if you want another way of sorting (for example: if first column matches, sort by second, then third, etc) it is more easily implemented

with open(my_file) as f:
    lines = [line.split(' ') for line in f]
output = open("result.txt", 'w')

for line in sorted(lines):
    output.write(' '.join(line), key=itemgetter(0))

output.close()
like image 1
Soronbe Avatar answered Nov 13 '22 16:11

Soronbe