Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex exclusion in python

Tags:

python

regex

I'm not so able with regex and I'm looking for the syntax to exclude something. I'm parsing <, >, " and & in html code (to replace with &lt;, etc) and I need to exclude <br/> from parsing. I.E.:

<html><br/>
   <head><title></title></head><br/>
   <body><br/>
   </body><br/>
</html>

I tried sometihng like i.e.: r'<\b?![br]' and others, but they don't work completely. I use re.sub() to replace.

like image 541
stdio Avatar asked Apr 25 '26 20:04

stdio


1 Answers

Ok, now the question is open again, I can do it as an answer, so...

Unless I'm missing something, and once it's just <br/> (not any variants), then can just replace <(?!br/>) with &lt; and (?<!<br/)> with &gt; and that's it?


In Python, it looks like that means this:

text = re.sub( '<(?!br/>)' , '&lt;' , text )
text = re.sub( '(?<!<br/)>' , '&gt;' , text )


To explain what's going on, (?!...) is a negative lookahead - it only successfully matches at a position if the following text does not match the sub-expression it contains.
(Note lookaheads do not consume the text matched by their sub-expression, they only verify if it exists, or not.)

Similarly, (?<!...) is a negative lookbehind, and does the same thing but using the preceding text.

However, lookbehinds do have a slight different to lookaheads (in some regex implementations) - which is that the sub-expressions inside lookbehinds must represent fixed-width or limited-width matches.

Python is one of the ones that requires a fixed width - so whilst the above expression works (because it's always four characters), if it was (?<!<br\s*/?)> then it would not be a valid regex for Python because it represents a variable length match. (However, you can stack multiple lookbehinds, so you could potentially manually iterate the assorted options, if that was necessary.)

like image 50
Peter Boughton Avatar answered Apr 27 '26 09:04

Peter Boughton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!