I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error: <blockquote> AttributeError: 'str' object has no attribute 'next' </blockquote> I'm not sure what's wrong here. <pre class="prettyprint"><code>import re header="" counts=0 newline="" f1=open('fpprotein_fasta(2).txt','r') f2=open('motifs.xls','w') for line in f1: if line.startswith('>'): header=line #print header nextline=line.next() for i in nextline: motif="ML[A-Z][A-Z][IV]R" if re.findall(motif,nextline): counts+=1 #print (header+'\t'+counts+'\t'+motif+'\n') fout.write(header+'\t'+counts+'\t'+motif+'\n') f1.close() f2.close() </code></pre>

This might help getting you in the right direction <pre class="prettyprint"><code>import re def parse(fasta, outfile): motif = "ML[A-Z][A-Z][IV]R" header = None with open(fasta, 'r') as fin, open(outfile, 'w') as fout: for line in fin: if line.startswith('>'): if header is not None: fout.write(header + '\t' + str(count) + '\t' + motif + '\n') header = line count = 0 else: matches = re.findall(motif, line) count += len(matches) if header is not None: fout.write(header + '\t' + str(count) + '\t' + motif + '\n') if __name__ == '__main__': parse("fpprotein_fasta(2).txt", "motifs.xls") </code></pre>

how to read a fasta file in python?

Tags:

python

fasta

I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error:

AttributeError: 'str' object has no attribute 'next'

I'm not sure what's wrong here.

import re

header=""
counts=0
newline=""

f1=open('fpprotein_fasta(2).txt','r')
f2=open('motifs.xls','w')
for line in f1:
    if line.startswith('>'):
        header=line
        #print header
        nextline=line.next()
        for i in nextline:
            motif="ML[A-Z][A-Z][IV]R"
            if re.findall(motif,nextline):
                counts+=1
                #print (header+'\t'+counts+'\t'+motif+'\n')
        fout.write(header+'\t'+counts+'\t'+motif+'\n')

f1.close()
f2.close()

544

asked Dec 14 '13 07:12

user3098683

2 Answers

The error is likely coming from the line:

nextline=line.next()

line is the string you have already read, there is no next() method on it.

Part of the problem is that you're trying to mix two different ways of reading the file - you are iterating over the lines using for line in f1 and <handle>.next().

Also, if you are working with FASTA files I recommend using Biopython: it makes working with collections of sequences much easier. In particular, Chapter 14 on motifs will be of particular interest to you. This will likely require that you learn more about Python in order to achieve what you want, but if you're going to be doing a lot more bioinformatics than what your example here shows then it's definitely worth the investment of time.

159

answered Sep 22 '22 20:09

iainmcgin

This might help getting you in the right direction

import re

def parse(fasta, outfile):
    motif = "ML[A-Z][A-Z][IV]R"
    header = None
    with open(fasta, 'r') as fin, open(outfile, 'w') as fout:
            for line in fin:
                if line.startswith('>'):
                    if header is not None:
                        fout.write(header + '\t' + str(count) + '\t' + motif + '\n')
                    header = line
                    count = 0
                else:
                    matches = re.findall(motif, line)
                    count += len(matches)
            if header is not None:
                fout.write(header + '\t' + str(count) + '\t' + motif + '\n')
if __name__ == '__main__':
    parse("fpprotein_fasta(2).txt", "motifs.xls")

answered Sep 21 '22 20:09

Arnaud P

Related questions
                            
                                In C, is it possible to integrate new code in a running process by recompiling a dynamic library?
                            
                                Extracting strings in Python in either single or double quotes
                            
                                matplotlib plotting multiple lines in 3D
                            
                                Combining Numpy Arrays in Blockwise Form
                            
                                Python Iterators on Linked List Elements
                            
                                Get position of subsequence using Levenshtein-Distance
                            
                                Python @property design
                            
                                How to subclass QMessageBox and add a progress bar in PySide
                            
                                Why does __new__ method need passing cls argument when called its parent's __new__ across super method?
                            
                                inheritance in python 2.7.x
                            
                                Subtract a column from one pandas dataframe from another
                            
                                Django Error : userprofile matching query does not exist
                            
                                Return vector<pair<int,int>> & from c++ method to python list of tuples using swig typemap
                            
                                Python script to filter a list of strings based on ending
                            
                                (Python) __new__ method returning something other than class instance
                            
                                Method to set scipy optimization minimization step size
                            
                                difference between {str} and {str_} in Python
                            
                                Faster way to count number of string occurrences in a numpy array python
                            
                                Is it possible to parse non-trivial C enums with pyparsing?
                            
                                Python: requests module throws exception with Gevent

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With