I'm trying to use python's regular expression to match a string with several words. For example, the string is "These are oranges and apples and pears, but not pinapples or .." The list of words I want to find is 'and', 'or' and 'not'. No matter the order or the position.
I tried r'AND | OR | NOT
but didn't work.
Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$
still didn't work...
Not good at regular expression.. And hint? Thanks!
You've got a few problems there.
First, matches are case-sensitive unless you use the IGNORECASE
/I
flag to ignore case. So, 'AND'
doesn't match 'and'
.
Also, unless you use the VERBOSE
/X
flag, those spaces are part of the pattern. So, you're checking for 'AND '
, not 'AND'
. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader'
is going to match…), and really, you probably wanted \b
, not a space (otherwise a sentence starting with 'And another thing'
isn't going to match).
Finally, if you think you need .*
before and after your pattern and $
and ^
around it, there's a good chance you wanted to use search
, findall
, or finditer
, rather than match
.
So:
>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']
Debuggex Demo
Try this:
>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']
a|b means match either a or b
\b represents a word boundary
re.findall(pattern, string) returns an array of all instances of pattern in string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With