I am looking to use regex to extract text which occurs between two strings. I know how to do if i want to extract between the same strings every time (and countless questions asking for this e.g. Regex matching between two strings?), but I want to do it using variables which change, and may themselves include special characters within Regex. (i want any special characters, e.g. * treated as text).
For example if i had:
text = "<b*>Test</b>"
left_identifier = "<b*>"
right_identifier = "</b>
i would want to create regex code which would result in the following code being run:
re.findall('<b\*>(.*)<\/b>',text)
It is the <b\*>(.*)<\/b>
part that I don't know how to dynamically create.
To extract part string between two different characters, you can do as this: Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.
(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.
The \r metacharacter matches carriage return characters.
You can do something like this:
import re
pattern_string = re.escape(left_identifier) + "(.*?)" + re.escape(right_identifier)
pattern = re.compile(pattern_string)
The escape function will automatically escape special characters. For eg:
>>> import re
>>> print re.escape("<b*>")
\<b\*\>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With