Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

functional difference between lookarounds and non-capture group?

I'm trying to come up with an example where positive look-around works but non-capture groups won't work, to further understand their usages. The examples I"m coming up with all work with non-capture groups as well, so I feel like I"m not fully grasping the usage of positive look around.

Here is a string, (taken from a SO example) that uses positive look ahead in the answer. The user wanted to grab the second column value, only if the value of the first column started with ABC, and the last column had the value 'active'.

string ='''ABC1    1.1.1.1    20151118    active
          ABC2    2.2.2.2    20151118    inactive
          xxx     x.x.x.x    xxxxxxxx    active'''

The solution given used 'positive look ahead' but I noticed that I could use non-caputure groups to arrive at the same answer. So, I'm having trouble coming up with an example where positive look-around works, non-capturing group doesn't work.

pattern =re.compile('ABC\w\s+(\S+)\s+(?=\S+\s+active)') #solution

pattern =re.compile('ABC\w\s+(\S+)\s+(?:\S+\s+active)') #solution w/out lookaround

If anyone would be kind enough to provide an example, I would be grateful.

Thanks.

like image 222
Moondra Avatar asked Aug 29 '17 17:08

Moondra


1 Answers

The fundamental difference is the fact, that non-capturing groups still consume the part of the string they match, thus moving the cursor forward.

One example where this makes a fundamental difference is when you try to match certain strings, that are surrounded by certain boundaries and these boundaries can overlap. Sample task:

Match all as from a given string, that are surrounded by bs - the given string is bababaca. There should be two matches, at positions 2 and 4.

Using lookarounds this is rather easy, you can use b(a)(?=b) or (?<=b)a(?=b) and match them. But (?:b)a(?:b) won't work - the first match will also consume the b at position 3, that is needed as boundary for the second match. (note: the non-capturing group isn't actually needed here)

Another rather prominent sample are password validations - check that the password contains uppercase, lowercase letters, numbers, whatever - you can use a bunch of alternations to match these - but lookaheads come in way easier:

(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!?.])

vs

(?:.*[a-z].*[A-Z].*[0-9].*[!?.])|(?:.*[A-Z][a-z].*[0-9].*[!?.])|(?:.*[0-9].*[a-z].*[A-Z].*[!?.])|(?:.*[!?.].*[a-z].*[A-Z].*[0-9])|(?:.*[A-Z][a-z].*[!?.].*[0-9])|...
like image 135
Sebastian Proske Avatar answered Oct 01 '22 07:10

Sebastian Proske