I would like to write a regex for searching for the existence of some words, but their order of appearance doesn't matter.
For example, search for "Tim" and "stupid". My regex is Tim.*stupid|stupid.*Tim
. But is it possible to write a simpler regex (e.g. so that the two words appear just once in the regex itself)?
$ means "Match the end of the string" (the position after the last character in the string).
Looking Inside The Regex Engine As was mentioned earlier: the order of the characters inside a character class does not matter.
In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart. \d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ).
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
See this regex:
/^(?=.*Tim)(?=.*stupid).+/
Regex explanation:
^
Asserts position at start of string.(?=.*Tim)
Asserts that "Tim" is present in the string.(?=.*stupid)
Asserts that "stupid" is present in the string..+
Now that our phrases are present, this string is valid. Go ahead and use .+
or - .++
to match the entire string.To use lookaheads more exclusively, you can add another (?=.*<to_assert>)
group. The entire regex can be simplified as /^(?=.*Tim).*stupid/
.
See a regex demo!
>>> import re
>>> str ="""
... Tim is so stupid.
... stupid Tim!
... Tim foobar barfoo.
... Where is Tim?"""
>>> m = re.findall(r'^(?=.*Tim)(?=.*stupid).+$', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
>>> m = re.findall(r'^(?=.*Tim).*stupid', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
Read more:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With