Looping over lines with Python

Question

So I have a file that contains this:

SequenceName 4.6e-38 810..924
SequenceName_FGS_810..924 VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
SequenceName 1.6e-38 887..992
SequenceName_GYQ_887..992 PLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH

I want my program to read only the lines that contain these protein sequences. Up until now I got this, which skips the first line and read the second one:

handle = open(filename, "r")
handle.readline()
linearr = handle.readline().split()
handle.close()

fnamealpha = fname + ".txt"
handle = open(fnamealpha, "w")
handle.write(">%s
%s
" % (linearr[0], linearr[1]))
handle.close()

But it only processes the first sequence and I need it to process every line that contains a sequence, so I need a loop, how can I do it? The part that saves to a txt file is really important too so I need to find a way in which I can combine these two objectives. My output with the above code is:

>SequenceName_810..924
VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH

modocache · Accepted Answer

Okay, I think I understand your question--you want to iterate over the lines in the file, right? But only the second line in the sequence--the one with the protein sequence--matters, correct? Here's my suggestion:

# context manager `with` takes care of file closing, error handling
with open(filename, 'r') as handle:
    for line in handle:
        if line.startswith('SequenceName_'):
             print line.split()
             # Write to file, etc.

My reasoning being that you're only interested in lines that start with SequenceName_###.

Looping over lines with Python

Tags:

python

loops

John

1 Answers

modocache

Recent Activity

Donate For Us

Looping over lines with Python

Tags:

python

loops

John

1 Answers

modocache

Related questions

Recent Activity

Donate For Us