Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Odd behavior on negative look behind in python

I am trying to do a re.split using a regex that is utilizing look-behinds. I want to split on newlines that aren't preceded by a \r. To complicate things, I also do NOT want to split on a \n if it's preceded by a certain substring: XYZ.

I can solve my problem by installing the regex module which lets me do variable width groups in my look behind. I'm trying to avoid installing anything, however.

My working regex looks like:

regex.split("(?<!(?:\r|XYZ))\n", s)

And an example string:

s = "DATA1\nDA\r\n \r\n \r\nTA2\nDA\r\nTA3\nDAXYZ\nTA4\nDATA5"

Which when split would look like:

['DATA1', 'DA\r\n \r\n \r\nTA2', 'DA\r\nTA3', 'DAXYZ\nTA4', 'DATA5']

My closest non-working expression without the regex module:

re.split("(?<!(?:..\r|XYZ))\n", s)

But this split results in:

['DATA1', 'DA\r\n \r', ' \r', 'TA2', 'DA\r\nTA3', 'DAXYZ\nTA4', 'DATA5']

And this I don't understand. From what I understand about look behinds, this last expression should work. Any idea how to accomplish this with the base re module?

like image 935
DivineSlayer Avatar asked Jan 31 '26 07:01

DivineSlayer


1 Answers

You can use:

>>> re.split(r"(?<!\r)(?<!XYZ)\n", s)
['DATA1', 'DA\r\n \r\n \r\nTA2', 'DA\r\nTA3', 'DAXYZ\nTA4', 'DATA5']

Here we have broken your lookbehind assertions into two assertions:

(?<!\r)  # previous char is not \r
(?<!XYZ) # previous text is not XYZ

Python regex engine won't allow (?<!(?:\r|XYZ)) in lookbehind due to this error

error: look-behind requires fixed-width pattern
like image 51
anubhava Avatar answered Feb 02 '26 22:02

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!