Python Regex Bullet Point and multiple line match

Question

Wanting to match something that's between two words or phrases, that has a bullet point inside of it, is on multiple lines in python and works for every variation of the words between the beginning and end. Don't know the identifier used for bullet points or the identifiers to match everything including line breaks. For example trying to match:

Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World

in

 hello_big_old_world = "qweqrqr  Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas"

Where this string is over multiple lines. I know its probably not in the ball park, but here's what I have so far and obviously it isn't working.

Answer = re.findall("(?<=qweqrqr)(.*
?)/s(?=fdsfdas)"), hello_big_old_world)
print(Answer)

Thanks in Advance.

Wiktor Stribiżew · Accepted Answer

You may match the string from qweqrqr to fdsfdas with at least 1 bullet point using

hello_big_old_world = "qweqrqr  Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas"
print(re.findall(r'qweqrqr([^\u2022]*\u2022.*?)fdsfdas', hello_big_old_world, re.S))

See the Python 3 demo.

Note that you may use • instead of the Unicode char representation and also strip the whitespaces from the captured text if you add \s* (=0+ whitespace chars) on both ends of the parenthetical group:

re.findall(r'qweqrqr\s*([^•]*•.*?)\s*fdsfdas', hello_big_old_world, re.S)

It should work in both Python 3 and Python 2.

Details

qweqrqr - matches the right delimiter
([^\u2022]*\u2022.*?) / ([^•]*•.*?) - captures into Group 1 (the string returned with re.findall)
- [^\u2022]* / [^•]* - any chars other than the bullet point
- \u2022 / • - the bullet point
- .*? - any 0+ chars (including a newline due to the re.S (=re.DOTALL) flag) as few as possible (due to the lazy quantifier *?)
fdsfdas - matches the left delimiter

Alex Hall · Answer

To match all characters including newlines, you still use the . character, but pass flags=re.DOTALL to functions such as re.findall.

Austin · Answer

You can use your regex with slight changes:

/s should be \s.
use re.DOTALL to match cases where you have newlines in-between.

Working code:

import re

hello_big_old_world = 'qweqrqr  Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World fdsfdas'

Answer = re.findall("(?<=qweqrqr)(.*
?)\s(?=fdsfdas)", hello_big_old_world, re.DOTALL)
print(Answer)

# [' Hello • World Hello • World Hello • World Hello • World Hello • World Hello • World']

Python Regex Bullet Point and multiple line match

Tags:

python

regex

crooose

3 Answers

Wiktor Stribiżew

Alex Hall

Austin

Recent Activity

Donate For Us

Python Regex Bullet Point and multiple line match

Tags:

python

regex

crooose

3 Answers

Wiktor Stribiżew

Alex Hall

Austin

Related questions

Recent Activity

Donate For Us