Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Regexp: + vs *. special behaviour?

Tags:

regex

ruby

Using ruby regexp I get the following results:

>> 'foobar'[/o+/]
=> "oo"
>> 'foobar'[/o*/]
=> ""

But:

>> 'foobar'[/fo+/]
=> "foo"
>> 'foobar'[/fo*/]
=> "foo"

The documentation says:
*: zero or more repetitions of the preceding
+: one or more repetitions of the preceding

So i expect that 'foobar'[/o*/] returns the same result as 'foobar'[/o+/]

Does anybody have an explanation for that

like image 592
seb Avatar asked Mar 24 '10 12:03

seb


2 Answers

'foobar'[/o*/] is matching the zero os that appear before the f, at position 0
'foobar'[/o+/] can't match there because there needs to be at least 1 o, so it instead matches all the os from position 1

Specifically, the matches you are seeing are

'foobar'[/o*/] => '<>foobar'
'foobar'[/o+/] => 'f<oo>bar'

like image 184
Gareth Avatar answered Sep 22 '22 16:09

Gareth


This is a common misunderstanding of how regexp works.

Although the * is greedy and isn't anchored at the start of the string, the regexp engine will still start looking from beginning of the string. In case of "/o+/", it does not match at position 0 (eg. "f"), but since the + means one or more, it has to continue matching (this has nothing to do with the greediness) until a match is found or all positions are evaluated.

However with the case of "/o*/", which as you know mean 0 or more times, when it doesn't match at position 0, the regexp engine will gracefully stop at that point (as it should, because o* simply means that the o is optional). There's also performance reasons, since "o" is optional, why spend more time looking for it?

like image 39
reko_t Avatar answered Sep 23 '22 16:09

reko_t