Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python look-behind regex issue: Invalid regular expression: look-behind requires fixed-width pattern

I need to match a linebreak in-between double quotes, as in:

<p class="calibre1">“This is the first sentence.</p>
<p class="calibre1">And this is the second!”</p>

This would match </p> <p class="calibre1">

Now, I got this working with the regex (?<=“[^”]*)</p>\s*<p[^>]*>(?!“) but I get the error described in the title: "Invalid regular expression: look-behind requires fixed-width pattern" when I try to use it non-manually. I need this regex for the eBook management/editing program, Calibre, which uses Python for its regex engine. The regex above works for manually searching a book, but when I try to include the regex as a "common option" (run on each eBook conversion) I get that error.

I don't see how it's possible to do this without a variable width look-behind, since you can't know how long it will be from the left doublequote to the linebreak. Help would be much appreciated!

like image 636
Zout Avatar asked Mar 09 '26 23:03

Zout


1 Answers

Python re module, as most languages (with the notable exception of .NET), doesn't support variable length lookbehind.

Can't you use a capturing group instead ?

“[^”]*(</p>\s*<p[^>]*>)

Data in the first capturing group.

like image 200
Robin Avatar answered Mar 12 '26 11:03

Robin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!