Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you match even numbers of letter or odd numbers of letter using regexp for mysql

Tags:

regex

mysql

Does anyone know how to match even numbers and odd numbers of letter using regexp in mysql? i need to match like a even number of A's followed by an odd number of G's and then at least one TC? For example: acgtccAAAAGGGTCatg would match up. It's something for dna sequencing

like image 616
thunderb0lt Avatar asked Apr 07 '11 13:04

thunderb0lt


People also ask

How do I match a regex pattern?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Which regex is applicable for alphabets?

[A-Za-z] will match all the alphabets (both lowercase and uppercase).

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.


1 Answers

An even number of A's can be expressed as (AA)+ (one or more instance of AA; so it'll match AA, AAAA, AAAAAA...). An odd number of Gs can be expressed as G(GG)* (one G followed by zero or more instances of GG, so that'll match G, GGG, GGGGG...).

Put that together and you've got:

/(AA)+G(GG)*TC/

However, since regex engines will try to match as much as possible, this expression will actually match a substring of AAAGGGTC (ie. AAGGGTC)! In order to prevent that, you could use a negative lookbehind to ensure that the character before the first A isn't another A:

/(?<!A)(AA)+G(GG)*TC/

...except that MySQL doesn't support lookarounds in their regexes.

What you can do instead is specify that the pattern either starts at the beginning of the string (anchored by ^), or is preceded by a character that's not A:

/(^|[^A])(AA)+G(GG)*TC/

But note that with this pattern an extra character will be captured if the pattern isn't found at the start of the string so you'll have to chop of the first character if it's not an A.

like image 160
Daniel Vandersluis Avatar answered Sep 20 '22 07:09

Daniel Vandersluis