Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to return all characters between two strings

Tags:

python

regex

How can I design a regular expression that will capture all the characters between 2 strings? Specifically, from this big string:

Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]

I want to extract all the characters between [^title= and ], that is, Fish consumption and incidence of stroke: a meta-analysis of cohort studies and The second title.

I think I will have to use re.findall(), and that I can start with this: re.findall(r'\[([^]]*)\]', big_string), which will give me all the matches between the square brackets [ ], but I'm not sure how to extend it.

like image 749
bard Avatar asked Mar 03 '26 03:03

bard


1 Answers

>>> text = "Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]"
>>> re.findall(r"\[\^title=(.*?)\]", text)
['Fish consumption and incidence of stroke: a meta-analysis of cohort studies', 'The second title']

Here is a breakdown of the regex:

\[ is an escaped [ character.

\^ is an escaped ^ character.

title= matches title=

(.*?) matches any characters, non-greedily, and puts them in a group (for findall to extract). Which means it stops when it finds a...

\], which is an escaped ] character.

like image 102
icedtrees Avatar answered Mar 05 '26 16:03

icedtrees