Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.findall non-greedy result

I'm trying to get only "Text3" part with the following code:

import re
stringtotest = "begin:Text1<wrong>Text2<wrong>Text3<right>Text4<wrong>"
right = re.findall("<wrong>(.+?)<right>",stringtotest)
>>> right
['Text2<wrong>Text3']

Why Python gives me Text2 as well? How to tell him I want only the part after the nearest "wrong"? Thank you.

like image 851
alexanderk409 Avatar asked Apr 24 '26 19:04

alexanderk409


2 Answers

The dot . matches anything. You can use a negated character class to restrict the match:

<wrong>([^<]+?)<right>

If you want to get the middle section without the outer tags, use lookaheads and lookbehinds to assert the position of the tags:

(?<=<wrong>)([^<]+?)(?=<right>)
like image 109
Vasili Syrakis Avatar answered Apr 26 '26 07:04

Vasili Syrakis


<wrong>((?:(?!<wrong>).)*)<right>

You can use a negated lookahead based quantifier.See demo.

https://regex101.com/r/8yUhDL/1

like image 21
vks Avatar answered Apr 26 '26 07:04

vks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!