Somehow I am not able to find anything online about how to set a pattern ending to a double \n. My particular case is the following. I have this string:
"1 Matt\n00:00:00,100 --> 00:00:01,500\nThis is said \nby Matt.\n\n2 Lucas\n00:00:01,700 --> 00:00:02,300\nWhile this is said by Lucas"
And I would like to extract only the texts between digit\n and \n\n. So, in my case, I'd like to have
This is said \nby Matt.
While this is said by Lucas
Although I am not very skilled with RegEx, I tried many combinations such as ?<=\d\n).*?(?=\n\n), ?<=\d\n).\n\n and ?<=\d\n).*?(?=\r\n\r\n) but without any luck.
I have tried those as well as others with R's stringr library, but also with python's re.
The issue first came up in this answer: https://stackoverflow.com/a/72547966/19284124
You can make the . match across lines with the (?s) inline modifier and extend the double newline pattern to alternatively match the end of string:
(?s)(?<=\d\n).*?(?=\n\n|\Z)
See the regex demo.
Details:
(?s) - a flag allowing . match line break chars(?<=\d\n) - a positive lookbehind that matches a location that is immediately preceded with a digit and a newline.*? - any zero or more chars, as few as possible(?=\n\n|\Z) - a positive lookahead that matches a location that is immediately followed with two newline chars or end of string.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With