Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: match permutations of DNA sequence

How do I make a regular expression to evaluate the following string?

TGATGCCGTCCCCTCAACTTGAGTGCTCCTAATGCGTTGC

and extract the pattern CTCCT.

The pattern must be 3 C's and 2 T's in any order.

I tried /[C | T]{5}/ but it matches CCCCT and TCCCC

Thanks in Advance.

like image 724
Ryan Pace Sloan Avatar asked Jun 15 '16 00:06

Ryan Pace Sloan


People also ask

How can regex be used to find patterns in genetics?

Regular expressions (regex) in Python can be used to help us find patterns in Genetics. We can exploit regex when we analyse Biological sequence data, as very often we are looking for patterns in DNA, RNA or proteins. These sequence data types are just strings and therefore remarkable amendable for pattern analysis using regex.

How to decode the polyglutamine repeat number using regex?

We can use regex in order to decipher the polyglutamine repeat number. This firstly involves writing a pattern to find the tri-nucleotide repeat number above a set threshold. The Codon CAA also encodes glutamine, therefore, in the htt_pattern above we must use the | alternation operator.

What is a period in a regex?

A . period (or decimal point) is a wildcard that finds any character. If a protein kinase had the consensus sequence ‘RXYVHXFDEXK’ where X denotes any amino acid, then the regex ‘R.YVH.FDE.K’ would succeed in searching for substrates. However, a note of warning, the period (.), will match any character which is not even a letter.

What is the use of square brackets in regex?

A pair of square brackets with a list of characters inside them can represent any one of these characters (refer to Table 1). The real power of regex is exploited when these tools are used together.


1 Answers

This isn't the type of problem that is easily solved using Regular Expressions. It can be solved fairly straighforwardly with a simple function, however

 function c3t2(str) {
  var lowerCaseStr = str.toLowerCase();
  for (index = 0; index + 5 <= str.length; index++) {
    var substring = lowerCaseStr.substring(index, index + 5);
    var chars = substring.split("");
    if (chars.sort().join("") === "ccctt") {
      return index;
    }
  }

  return false;
}
like image 179
Andrew Rueckert Avatar answered Sep 30 '22 05:09

Andrew Rueckert