Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to possessive quantifier in python

I am trying to match all occurences of the String Article followed by a number (single or more digits) which are not followed by an opening parentheses. In Sublime Text, I am using the following regex:

Article\s[0-9]++(?!\()

to search the following String:

Article 29
Article 30(1)

which does not match Article 30(1) (as I expect it to) but Article 29 and Article 1.

When attempting to do the same in Python (3) using

import re
article_list = re.findall(r'Article\s[0-9]++(?!\()', "Article 30(1)")

I get an the following Error as I am using a (nested) possessive quantifier which is not supported by Python regex. Is there any way to match what I want it (not) to match in Python?

like image 408
user Avatar asked Jun 09 '17 12:06

user


People also ask

What is possessive quantifier regex?

As you'll see in the table below, a quantifier is made possessive by appending a + plus sign to it. Therefore, A++ is possessive—it matches as many characters as needed and never gives any of them back. Whereas the regex A+. matches the string AAA, A++. doesn't.

What is a lazy quantifier in regex?

A lazy quantifier first repeats the token as few times as required, and gradually expands the match as the engine backtracks through the regex to find an overall match. Because greediness and laziness change the order in which permutations are tried, they can change the overall regex match.

What is a possessive quantifier?

A possessive quantifier is similar to greedy quantifier. It indicates the engine to start by checking the entire string.It is different in the sense if it doesn't work, if match failed and there is no looking back. Following are various examples of Possessive Quantifiers using regular expression in java.

What is greedy match in Python?

A greedy match means that the regex engine (the one which tries to find your pattern in the string) matches as many characters as possible.


1 Answers

You can also emulate an atomic group (?>...) around what you want to match, using the (?=(...))\1 workaround:

(?=(Article\s[0-9]+))\1(?!\()

(a lookahead behaves naturally like an a atomic group, all you need is a capture and a backreference)

like image 191
Casimir et Hippolyte Avatar answered Sep 30 '22 08:09

Casimir et Hippolyte