Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression with codons

Tags:

python

regex

Struggling with RE to search for sequences 'TAA' (triplets of 3 characters) 'TAA' again.

I tried the following:

re.findall('TAA...+?TAA',seq) which of course does not give triplets but does give me sequences

re.findall('TAA([ATGC]{3})+?TAA' , seq) however gives me a list as output

'AGG', 'TCT', 'GTG', 'TGG', 'TGA', 'TAT',

Any ideas? As I of course can check the output from

re.findall('TAA...+?TAA',seq)

if length % 3 == 0, but how to do this with RE?

like image 664
Jasper Avatar asked Mar 08 '12 13:03

Jasper


1 Answers

You want a non-capturing group.

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

Try this:

re.findall('TAA(?:[ATGC]{3})+?TAA' , seq)
like image 155
Mark Byers Avatar answered Oct 22 '22 14:10

Mark Byers