Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the profit of using /.*?/

In some Rails code (cucumber features' steps definitions, javascripts, rails_admin gem) I found this regular expression parts:

string =~ /some regexp.+rules should match "(.*?)"/i

I do have some knowledge at regular expressions and i know that * and ? symbols are similar but whilst asterisk means zero and more, the question mark means could be present or could be not.

So, using the question mark near the group of symbols makes its presence non-required within the phrase being tested. What's the... well... the trick of using it near the non-required already group (skipping requirement is made using the asterisk afaik)?

like image 582
shybovycha Avatar asked Nov 15 '12 16:11

shybovycha


2 Answers

Right after a quantifier (like *), the ? has a different meaning and makes it "ungreedy". So while the default is that * consumes as much as possible, *? matches as little as possible.

In your specific case, this is relevant for strings like this:

some regexp rules should match "some string" or "another"

Without the question mark the regex matches the full string (because .* can consume " just like anything else) and some string" or "another is captured. With the use of the question mark, the match will stop as soon as possible, (so after ...some string") and will capture only some string.

Further reading.

like image 142
Martin Ender Avatar answered Sep 27 '22 20:09

Martin Ender


? has dual meaning.

/foo?/

means the last o can be there zero or one times.

/foo*?/ 

means the last o will be there zero or many times, but select the minimum number, i.e., it's non-greedy.

These might help explain:

'foo'[/foo?/]   # => "foo"
'fo'[/foo?/]    # => "fo"
'fo'[/foo*?/]   # => "fo"
'foo'[/foo*?/]  # => "fo"
'fooo'[/foo*?/] # => "fo"

The non-greedy use of ? is unfortunate I think. They reused an operator we expected to have a single meaning "zero or one" and threw it at us in a way that can really be difficult to decipher.

But, the need was genuine: Too many times we'd write a pattern that would go wildly wrong, gobbling everything in sight, because the regex engine was doing what we said with unforeseen character patterns. Regex can be very complex and convoluted, but the "non-greedy" use of ? helps tame that. Sometimes, using it is the sloppy or quick-n-dirty way out but we don't have time to rewrite the pattern to do it correctly. Sometimes it's the magic bullet and was elegant. I think which it is depends on whether you're under a deadline and writing code to get something done, or you're debugging years after the fact and finally found that ? wasn't the optimal fix.

like image 34
the Tin Man Avatar answered Sep 27 '22 22:09

the Tin Man