Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression match multiple words anywhere

Tags:

python

regex

I'm trying to use python's regular expression to match a string with several words. For example, the string is "These are oranges and apples and pears, but not pinapples or .." The list of words I want to find is 'and', 'or' and 'not'. No matter the order or the position.

I tried r'AND | OR | NOT but didn't work.

Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$ still didn't work...

Not good at regular expression.. And hint? Thanks!

like image 750
JudyJiang Avatar asked Nov 18 '14 01:11

JudyJiang


2 Answers

You've got a few problems there.

First, matches are case-sensitive unless you use the IGNORECASE/I flag to ignore case. So, 'AND' doesn't match 'and'.

Also, unless you use the VERBOSE/X flag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader' is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing' isn't going to match).

Finally, if you think you need .* before and after your pattern and $ and ^ around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.

So:

>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']

Regular expression visualization

Debuggex Demo

like image 98
abarnert Avatar answered Nov 18 '22 16:11

abarnert


Try this:

>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']

a|b means match either a or b

\b represents a word boundary

re.findall(pattern, string) returns an array of all instances of pattern in string

like image 5
Vedaad Shakib Avatar answered Nov 18 '22 18:11

Vedaad Shakib