How can I design a regular expression that will capture all the characters between 2 strings? Specifically, from this big string:
Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]
I want to extract all the characters between [^title= and ], that is, Fish consumption and incidence of stroke: a meta-analysis of cohort studies and The second title.
I think I will have to use re.findall(), and that I can start with this: re.findall(r'\[([^]]*)\]', big_string), which will give me all the matches between the square brackets [ ], but I'm not sure how to extend it.
>>> text = "Studies have shown that...[^title=Fish consumption and incidence of stroke: a meta-analysis of cohort studies]... Another experiment demonstrated that... [^title=The second title]"
>>> re.findall(r"\[\^title=(.*?)\]", text)
['Fish consumption and incidence of stroke: a meta-analysis of cohort studies', 'The second title']
Here is a breakdown of the regex:
\[ is an escaped [ character.
\^ is an escaped ^ character.
title= matches title=
(.*?) matches any characters, non-greedily, and puts them in a group (for findall to extract). Which means it stops when it finds a...
\], which is an escaped ] character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With