Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python non-greedy regexes

People also ask

What is non-greedy expression in Python?

A non-greedy match means that the regex engine matches as few characters as possible—so that it still can match the pattern in the given string.

What is greedy matching in regular expressions?

Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient. Instead of matching till the first occurrence of '>', it extracted the whole string.

What is difference between * and in regular expression?

represents any single character (usually excluding the newline character), while * is a quantifier meaning zero or more of the preceding regex atom (character or group). ? is a quantifier meaning zero or one instances of the preceding atom, or (in regex variants that support it) a modifier that sets the quantifier ...

What are Python quantifiers?

The + quantifier matches its preceding element one or more times. For example, the \d+ matches one or more digits. The following example uses the + quantifier to match one or more digits in a string: import re s = "Python 3.10 was released in 2021" matches = re.finditer('\d+', s) for match in matches: print(match)


You seek the all-powerful *?

From the docs, Greedy versus Non-Greedy

the non-greedy qualifiers *?, +?, ??, or {m,n}? [...] match as little text as possible.


>>> x = "a (b) c (d) e"
>>> re.search(r"\(.*\)", x).group()
'(b) c (d)'
>>> re.search(r"\(.*?\)", x).group()
'(b)'

According to the docs:

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behavior isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.


Would not \\(.*?\\) work? That is the non-greedy syntax.


Using an ungreedy match is a good start, but I'd also suggest that you reconsider any use of .* -- what about this?

groups = re.search(r"\([^)]*\)", x)

As the others have said using the ? modifier on the * quantifier will solve your immediate problem, but be careful, you are starting to stray into areas where regexes stop working and you need a parser instead. For instance, the string "(foo (bar)) baz" will cause you problems.


Do you want it to match "(b)"? Do as Zitrax and Paolo have suggested. Do you want it to match "b"? Do

>>> x = "a (b) c (d) e"
>>> re.search(r"\((.*?)\)", x).group(1)
'b'