I have spent some time learning Regular Expression, but I still don't understand how the following trick works to match two words in different order.
import re
reobj = re.compile(r'^(?=.*?(John))(?=.*?(Peter)).*$',re.MULTILINE)
string = '''
John and Peter
Peter and John
James and Peter and John
'''
re.findall(reobj,string)
result
[('John', 'Peter'), ('John', 'Peter'), ('John', 'Peter')]
( https://www.regex101.com/r/qW4rF4/1)
I know the (?=.* )
part is called Positive Lookahead
, but how does it work in this situation?
Any explanation?
However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.
Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
It just does not match in any arbitrary order.Capturing here is being done by .*
which consumes anything which comes its way.The positive lookahead
makes an assertion .You have two lookaheads
.They are independent of each other.Each makes an assertion one word.So finally your regex works like:
1)(?=.*?(John))
===String should have a John
.Just an assertion.Does not consume anything
2)(?=.*?(Peter))
===String should have a Peter
.Just an assertion.Does not consume anything
3).*
===Consume anything if assertions have passed
So you see the order does not matter here.,what is imp is that assertions should pass
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With