Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Positive Lookbehind greedy

I think I have some misunderstanding about how a positive Lookbehind works in Regex, here is an example:

12,2 g this is fully random
89 g random string 2
0,6 oz random stuff
1 really random stuff

Let's say I want to match everything after the measuring unit, so I want "this is fully random", "random string 2", "random stuff" and really "random stuff".

In order to do that I tried the following pattern:

(?<=(\d(,\d)?) (g|oz)?).*

But as "?" means 0 or 1, it seems that the pattern prioritizes 0 over 1 in that case - So I get: enter image description here

But the measuring unit has to stay "optional" as it won't necessary be in the string (cf fourth instance)...

Any idea on how to deal with that issue? Thanks!

like image 485
mnd Avatar asked Mar 03 '23 00:03

mnd


1 Answers

It would be easier to look at the positions that it matches to see what happens. The assertion (?<=(\d(,\d)?) (g|oz)?) is true at a position where what is directly to the left is (\d(,\d)?) and optional (g|oz)?

The pattern goes from left to right, and the assertion is true at multiple places. But at the first place it encounters, it matches .* meaning 0+ times any char and will match until the end of the line.

See the positions on regex101

What you might do instead is match the digit part and make the space followed by g or oz optional and use a capturing group for the second part.

\d+(?:,\d+)?(?: g| oz)? (.*)

Regex demo

like image 187
The fourth bird Avatar answered Mar 28 '23 13:03

The fourth bird