How do I make a regular expression to evaluate the following string?
TGATGCCGTCCCCTCAACTTGAGTGCTCCTAATGCGTTGC
and extract the pattern CTCCT.
The pattern must be 3 C's and 2 T's in any order.
I tried /[C | T]{5}/ but it matches CCCCT and TCCCC
Thanks in Advance.
Regular expressions (regex) in Python can be used to help us find patterns in Genetics. We can exploit regex when we analyse Biological sequence data, as very often we are looking for patterns in DNA, RNA or proteins. These sequence data types are just strings and therefore remarkable amendable for pattern analysis using regex.
We can use regex in order to decipher the polyglutamine repeat number. This firstly involves writing a pattern to find the tri-nucleotide repeat number above a set threshold. The Codon CAA also encodes glutamine, therefore, in the htt_pattern above we must use the | alternation operator.
A . period (or decimal point) is a wildcard that finds any character. If a protein kinase had the consensus sequence ‘RXYVHXFDEXK’ where X denotes any amino acid, then the regex ‘R.YVH.FDE.K’ would succeed in searching for substrates. However, a note of warning, the period (.), will match any character which is not even a letter.
A pair of square brackets with a list of characters inside them can represent any one of these characters (refer to Table 1). The real power of regex is exploited when these tools are used together.
This isn't the type of problem that is easily solved using Regular Expressions. It can be solved fairly straighforwardly with a simple function, however
function c3t2(str) {
var lowerCaseStr = str.toLowerCase();
for (index = 0; index + 5 <= str.length; index++) {
var substring = lowerCaseStr.substring(index, index + 5);
var chars = substring.split("");
if (chars.sort().join("") === "ccctt") {
return index;
}
}
return false;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With