Code:
str = '<br><br />A<br />B'
print(re.sub(r'<br.*?>\w$', '', str))
It is expected to return <br><br />A
, but it returns an empty string ''
!
Any suggestion?
So the difference between the greedy and the non-greedy match is the following: The greedy match will try to match as many repetitions of the quantified pattern as possible. The non-greedy match will try to match as few repetitions of the quantified pattern as possible.
Non-greedy quantifiers match their preceding elements as little as possible to return the smallest possible match. Add a question mark (?) to a quantifier to turn it into a non-greedy quantifier.
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.
Greediness works from left to right, but not otherwise. It basically means "don't match unless you failed to match". Here's what's going on:
<br
at the start of the string..*?
is ignored for now, it is lazy.>
, and succeeds.\w
and fails. Now it's interesting - the engine starts backtracking, and sees the .*?
rule. In this case, .
can match the first >
, so there's still hope for that match.>\w
can match, but $
fails. Again, the engine comes back to the lazy .*
rule, and keeps matching, until it matches<br><br />A<br />B
Luckily, there's an easy solution: By replacing <br[^>]*>\w$
you don't allow matching outside of your tags, so it should replace the last occurrence.
Strictly speaking, this doesn't work well for HTML, because tag attributes can contain >
characters, but I assume it's just an example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With