Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Greedy match with negative lookahead in a regular expression

Tags:

python

I have a regular expression in which I'm trying to extract every group of letters that is not immediately followed by a "(" symbol. For example, the following regular expression operates on a mathematical formula that includes variable names (x, y, and z) and function names (movav and movsum), both of which are composed entirely of letters but where only the function names are followed by an "(".

re.findall("[a-zA-Z]+(?!\()", "movav(x/2, 2)*movsum(y, 3)*z")

I would like the expression to return the array

['x', 'y', 'z']

but it instead returns the array

['mova', 'x', 'movsu', 'y', 'z']

I can see in theory why the regular expression would be returning the second result, but is there a way I can modify it to return just the array ['x', 'y', 'z']?

like image 915
Abiel Avatar asked Nov 03 '11 17:11

Abiel


4 Answers

Another solution which doesn't rely on word boundaries:

Check that the letters aren't followed by either a ( or by another letter.

>>> re.findall(r'[a-zA-Z]+(?![a-zA-Z(])', "movav(x/2, 2)*movsum(y, 3)*z")
['x', 'y', 'z']
like image 114
taleinat Avatar answered Oct 16 '22 17:10

taleinat


Add a word-boundary matcher \b:

>>> re.findall(r'[a-zA-Z]+\b(?!\()', "movav(x/2, 2)*movsum(y, 3)*z")
['x', 'y', 'z']

\b matches the empty string in between two words, so now you're looking for letters followed by a word boundary that isn't immediately followed by (. For more details, see the re docs.

like image 43
Danica Avatar answered Oct 16 '22 18:10

Danica


You need to limit matches to whole words. So use \b to match the beginning or end of a word:

re.findall(r"\b[a-zA-Z]+\b(?!\()", "movav(x/2, 2)*movsum(y, 3)*z")
like image 1
ekhumoro Avatar answered Oct 16 '22 17:10

ekhumoro


An alternate approach: find strings of letters followed by either end-of-string or by a non-letter, non-bracket character; then capture the letter portion.

re.findall("([a-zA-Z]+)(?:[^a-zA-Z(]|$)", "movav(x/2, 2)*movsum(y, 3)*z")
like image 1
Karl Knechtel Avatar answered Oct 16 '22 18:10

Karl Knechtel