Wanting to match something that's between two words or phrases, that has a bullet point inside of it, is on multiple lines in python and works for every variation of the words between the beginning and end. Don't know the identifier used for bullet points or the identifiers to match everything including line breaks. For example trying to match:
Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World
in
hello_big_old_world = "qweqrqr Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas"
Where this string is over multiple lines. I know its probably not in the ball park, but here's what I have so far and obviously it isn't working.
Answer = re.findall("(?<=qweqrqr)(.*\n?)/s(?=fdsfdas)"), hello_big_old_world)
print(Answer)
Thanks in Advance.
You may match the string from qweqrqr
to fdsfdas
with at least 1 bullet point using
hello_big_old_world = "qweqrqr Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas"
print(re.findall(r'qweqrqr([^\u2022]*\u2022.*?)fdsfdas', hello_big_old_world, re.S))
See the Python 3 demo.
Note that you may use •
instead of the Unicode char representation and also strip the whitespaces from the captured text if you add \s*
(=0+ whitespace chars) on both ends of the parenthetical group:
re.findall(r'qweqrqr\s*([^•]*•.*?)\s*fdsfdas', hello_big_old_world, re.S)
It should work in both Python 3 and Python 2.
Details
qweqrqr
- matches the right delimiter([^\u2022]*\u2022.*?)
/ ([^•]*•.*?)
- captures into Group 1 (the string returned with re.findall
)
[^\u2022]*
/ [^•]*
- any chars other than the bullet point\u2022
/ •
- the bullet point.*?
- any 0+ chars (including a newline due to the re.S
(=re.DOTALL
) flag) as few as possible (due to the lazy quantifier *?
)fdsfdas
- matches the left delimiterTo match all characters including newlines, you still use the .
character, but pass flags=re.DOTALL
to functions such as re.findall
.
You can use your regex
with slight changes:
/s
should be \s
.
use re.DOTALL
to match cases where you have newlines in-between.
Working code:
import re
hello_big_old_world = 'qweqrqr Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas'
Answer = re.findall("(?<=qweqrqr)(.*\n?)\s(?=fdsfdas)", hello_big_old_world, re.DOTALL)
print(Answer)
# [' Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With