I am having trouble getting this regex to work and none of the canned ones I have found work reliably.
The desired result:
Produce the following via regex matches:
"Person One"
"Person Two"
"Person Three"
Out of these example lines:
By Person One, Person Two and Person Three
By Person One, Person Two
By Person One
By Person Two and Person Three
Here is what I have and note, if you break off the sections, I get partial matches but something with the lookbehind is throwing it off. Also, if there is a better way simpler but still reliable to pull all the "Persons" regardless of whether one, two, or three with an "and" is provided. It does not have to support more than the three but I would think as long as the "and" trails last the # of "Persons" can certainly remain variable without impacting the regex.
Saved current attempt (matches one but if you split my and lookbehind and run it then it does match all the "and" lines:
(?<=by )((\w+) (\w+))(?:,\s*)?((\w+) (\w+))?(?:\s*(?<=and ))((\w+) (\w+))
https://regex101.com/r/z3Y9TQ/1
Instead of using Lookbehind to check for and
you can use a non-capturing group like what you did with the comma:
(?<=by )(\w+ \w+)(?:,\s*)?(\w+ \w+)?(?:\sand\s)?(\w+ \w+)?
Note that you don't need to add each \w+
in a group.
Try it online.
The reason why Lookbehind won't work in this case is that you have it in the middle of your regex pattern. This is not how Lookbehind works. The matching starts from the beginning until it reaches the Lookbehind (e.g., (?<=prior)subsequent
), it matches what comes after it (i.e., subsequent
), then and only then it "looks behind" expecting to find prior
. So basically what comes before the Lookbehind must be followed by what's after the (?<=)
(i.e., subsequent
), but at the same time, what comes after the Lookbehind must be preceded by what's inside it (i.e., prior
). See where the problem comes from?
Therefore, in your example, the only way to match the full sentence with the Lookbehind in the middle is to also include the and
in the pattern which makes the Lookbehind redundant.
To illustrate, take a look at this demo. As you can see, the pattern ((?<=and )Person
matches Person
when it comes after and
. Now let's change it to Two (?<=and )Person
, you'd probably think it'll work, but it actually finds no matches and that's because it first looks for Two
, then it looks for Person
, but it doesn't find it (because "Person" doesn't immediately follow "Two ") so it doesn't proceed to the next step which is the Lookbehind.
The only way to make the Lookbehind work in this case, is to also include the and
right after the Two
like this: Two and (?<=and )Person
, which makes the Lookbehind redundant as explained above.
A similar behavior, but for Lookaheads (i.e., when Lookahead comes in the middle) is very well explained in this awesome answer be revo.
Hope that helps.
I can't seem to get the lookbehind for and
working, but this works with a non-capturing group:
(?<=by )(\w+ \w+)(?:, *)?(\w+ \w+)?(?: *)(?:and (\w+ \w+))?
I changed \s
to space in the regexp so it won't match the newlines.
DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With