How to match everything up to double newline "

" using regex in Python?

Question

Suppose I have the following Python string

str = """
....
Dummyline

Start of matching
+----------+----------------------------+
+   test   +           1234             +
+   test2  +           5678             +
+----------+----------------------------+

Finish above. Do not match this
+----------+----------------------------+
+  dummy1  +       00000000000          +
+  dummy2  +       12345678910          +
+----------+----------------------------+
"""

and I want to match everything that the first table has. I could use a regex that starts matching from

"Start"

and matches everything until it finds a double newline

I found some tips on how to do this in another stackoverflow post (How to match "anything up until this sequence of characters" in a regular expression?), but it doesn't seem to be working for the double newline case.

I thought of the following code

pattern = re.compile(r"Start[^

]")
matches = pattern.finditer(str)

where basically

[^x]

means match everything until character x is found. But this works only for characters, not with strings (" " in this case)

Anybody has any idea on it?

The fourth bird · Accepted Answer

You can match Start until the end of the lines, and then match all lines that start with a newline and are not immediately followed by a newline using a negative lookahead (?!

^Start .*(?:
?
(?!
?
).*)*

Explanation

^Start .* Match Start from the start of the string ^ and 0+ times any char except a newline
(?: Non capture group
- ? Match a newline
- (?! ? ) Negative lookahead, assert what is directly to the right is not a newline
- .* Match 0+ times any character except a newline
)* Close the non capturing group and repeat 0+ times to get all the lines

Regex demo

How to match everything up to double newline "\n\n" using regex in Python?

Tags:

python

regex

Andrew

1 Answers

The fourth bird

Recent Activity

Donate For Us