meaning of a `+` following a `*`, when the latter is used as a quantifier in a regular expression

Tags:

ruby

Today I came across the following regular expression and wanted to know what Ruby would do with it:

> "#a" =~ /^[\W].*+$/
=> 0
> "1a" =~ /^[\W].*+$/
=> nil

In this instance, Ruby seems to be ignoring the + character. If that is incorrect, I'm not sure what it is doing with it. I'm guessing it's not being interpreted as a quantifier, since the * is not escaped and is being used as a quantifier. In Perl/Ruby regexes, sometimes when a character (e.g., -) is used in a context in which it cannot be interpreted as a special character, it is treated as a literal. But if that was happening in this case, I would expect the first match to fail, since there is no + in the lvalue string.

Is this a subtly correct use of the + character? Is the above behavior a bug? Am I missing something obvious?

377

asked Sep 24 '13 00:09

Eric Walker

1 Answers

Well, you can certainly use a + after a *. You can read a bit about it on this site. The + after the * is called a possessive quantifier.

What it does? It prevents * from backtracking.

Ordinarily, when you have something like .*c and using this to match abcde, the .* will first match the whole string (abcde) and since the regex cannot match c after the .*, the engine will go back one character at a time to check if there is a match (this is backtracking).

Once it has backtracked to c, you will get the match abc from abcde.

Now, imagine that the engine has to backtrack a few hundred characters, and if you have nested groups and multiple * (or + or the {m,n} form), you can quickly end up with thousands, millions of characters to backtrack, called catastrophic backtracking.

This is where possessive quantifiers come in handy. They actually prevent any form of backtracking. In the above regex I mentioned, abcde will not be matched by .*+c. Once .*+ has consumed the whole string, it cannot backtrack and since there's no c at the end of the string, the match fails.

So, another possible use of possessive quantifiers is that they can improve the performance of some regexes, provided the engine can support it.

For your regex /^[\W].*+$/, I don't think that there's any improvement (maybe a tiny little improvement) that the possessive quantifier provides though. And last, it might easily be rewritten as /^\W.*+$/.

164

answered Oct 05 '22 13:10

Jerry

Related questions
                            
                                How to alias a class method within a module?
                            
                                ruby.exe not recognised when trying to login & use heroku
                            
                                How do I query for an Activity that has 3 tags?
                            
                                Wiki quotes API?
                            
                                Ruby script 'Killed'
                            
                                Detect if Rails is Running a Site
                            
                                Newline positions
                            
                                Encoding issue when using Nokogiri replace
                            
                                Make error - Installing RVM on Snow Leopard
                            
                                why is before :save callback hook not getting called from FactoryGirl.create()?
                            
                                minitest testing and code coverage
                            
                                How to decode subject fetched via Net::IMAP which in UTF8? (ruby)
                            
                                Rubygems on OSX missing
                            
                                Is it possible to ask for only certain columns from an ActiveRecord association?
                            
                                Integrating Wrap Bootstrap Themes with Ruby on Rails
                            
                                Unexpected unpack results with bit strings
                            
                                Rails - paperclip - Multiple photo upload not saving
                            
                                regex - What is the complexity of this regular expression for primes detect?
                            
                                How do I install mysql2 gem under Ruby 1.9.3? Works under 2.0.0
                            
                                Why does this code work in ruby 1.8 and not ruby 1.9?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With