I'm trying to construct a regular expression that will match a repeating DNA sequence of 2 characters. These characters can be the same.
The regex should match a repeating sequence of 2 characters at least 3 times and, here are some examples:
regex should match on:
and should not match on:
So far I've come up with the following regular expressions:
[ACGT]{2}
this captures any sequence consisting of exactly two characters (A, C, G or T). Now I want to repeat this pattern at least three times, so I tried the following regular expressions:
[ACGT]{2}{3,}
([ACGT]{2}){3,}
Unfortunately, the first one raises a 'multiple repeat' error (Python), while the second one will simply match any sequence with 6 characters consisting of A, C, G and T.
Is there anyone that can help me out with this regular expression? Thanks in advance.
You could perhaps make use of backreferences.
([ATGC]{2})\1{2,}
\1
is the backreference referring to the first capture group and will be what you have captured.
regex101 demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With