Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Biopython record extra 2 nucleotides either side of gene for +2,-2 reading frames

I'm looking for ambush stop codons. I've gotten my code to the point where I'm extracting the sequences I need from my embl files. However I'm a bit stumped on how to add two upstream and two downstream nucleotides so I end up having -2,-1, 0, 1, 2 reading frames.

for rec in SeqIO.parse("CP002701.embl", "embl"):
    if rec.features:
        for feature in rec.features:
            if feature.type == "CDS":
                print(feature.location)
                print(feature.qualifiers["protein_id"])
                print(feature.location.extract(rec).seq)      

is the section I want to change but not sure how to change .location in order to select the extra 4 bases that I am interested in.

like image 872
user7550844 Avatar asked Nov 09 '22 02:11

user7550844


1 Answers

@user7550844 (OP) wrote on Feb 12 '17 at 15:46:

After some help from mofoniusrex on reddit here's a solution that works:

for rec in SeqIO.parse("CP002701.embl", "embl"):
if rec.features:
    for feature in rec.features:
        if feature.type == "CDS":
            pad=2
            newloc = SeqFeature.FeatureLocation( feature.location.start - pad,feature.location.end + pad)
            newfeature=SeqFeature.SeqFeature(location=newloc,
                 type=feature.type,
                 strand=feature.strand,
                 ref=feature.ref,
                 ref_db=feature.ref_db)
            newfeature.qualifiers = feature.qualifiers
            print(newfeature.qualifiers)
            print(newfeature.location)
            print(newfeature.location.extract(rec).seq)
like image 67
merv Avatar answered Dec 25 '22 21:12

merv