Can you explain why \G in my Perl regex pattern behaves this way?

Tags:

2 Answers

As @Tomalak said you don't need *? because it is the reason for the confusion in your situation. Here is what is going down in your first piece of code:

It sees that (\w\w\w)*? is reluctant (optional) so it skips it and tries to match TGA but no luck so the engine backtracks and matches a three consecutive word characters reading ATC, now again it tries to match TGA but no luck again so it reads another three consecutive \w and the engine has read ATCGTT so far.

Now it tries TGA again and no luck, then backtracks and reads \w\w\w again so now it has ATCGTTGAA, and now tries to find TGA but it has already skipped the first one when it read the last three \w, so this is why the engine fails to find the first TGA and hence fails to reports it position.

Now the engine continues in this matter until it finds the TGA after the three AAA (if you kept going like i was doing you will see how this happens), and now it executes the instructions inside the the loop printing 18.

Since you have used the /g modifier, the next match attempt starts where the first one has ended and it fails, then it tries another match skipping a single character after the last match and so on until it matches the last TGA and prints 23.

So why in the second situation it only matches one position at 18, what is the effect of using the \G modifier ?

Well everything works the same until it finds the first match like the previous situation after the three AAA, then when the next match starts it tries to match \G which means try to match where the last match ended after the AAATGA and it works, then it tries to match the rest of the string but fails, but this time when the engine tries to skip a single character or two or three or so on it will always try to match \G first which won't happen unless if the match started at the end of the previous (that is after AAATGA) so it will keep failing, thus reporting only a single match position at 18.

Simply just remove *? as @Tomalak said.

answered Nov 15 '22 05:11

Ibrahim Najjar

You don't need to use *? at all.

$dna = "ATCGTTGAATGCAAATGACATGAC";
while ($dna =~ /(?:\w\w\w)TGA/g) {
    print "Got a TGA stop codon at position ", pos $dna, "\n";3.    
}

prints

Got a TGA stop codon at position 8
Got a TGA stop codon at position 18

Note that *? makes the preceding atom optional, but you actually want it to be required.

The non-capturing group (?: ...) is not really necessary. You could use a normal group.
Another variant would be /[TGAC]{3}TGA/g.

answered Nov 15 '22 05:11

Tomalak

Related questions
                            
                                What are the differences between lazy, greedy and possessive quantifiers?
                            
                                Split using RegEx in JavaScript
                            
                                regex match on R gregexpr
                            
                                Why OrientDB doesn't use indexes for searching with "LIKE" operator?
                            
                                Using perl as a better grep to match multiple lines using single line mode m/RE/s
                            
                                Regular expression for conditionally formatting a number string
                            
                                C# Regex Pattern Conundrum
                            
                                Combine Multiple Regexp Patterns
                            
                                How to remove HTML markup from a body of text within a Google Spreadsheet?
                            
                                Java regular expression to validate numeric comma separated values
                            
                                Different MAC Addresses Regex
                            
                                Replace/delete special characters within matched strings in sed
                            
                                Tidy up a string
                            
                                PHP: How to keep line-breaks using nl2br() with HTML Purifier?
                            
                                sed - Include newline in pattern
                            
                                Python tokenize sentence with optional key/val pairs
                            
                                Check if a string is a valid RegEx Pattern VB.NET
                            
                                Why does the order of alternatives matter in regex?
                            
                                Find all lines with a length greater than N
                            
                                regex - confused about lookaround functionality

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you explain why \G in my Perl regex pattern behaves this way?

Tags:

regex

perl

user2677944

People also ask

2 Answers

Ibrahim Najjar

Tomalak

Recent Activity

Donate For Us