I am looking to use regex to extract text which occurs between two strings. I know how to do if i want to extract between the same strings every time (and countless questions asking for this e.g. Regex matching between two strings?), but I want to do it using variables which change, and may themselves include special characters within Regex. (i want any special characters, e.g. * treated as text). For example if i had: <pre class="prettyprint"><code>text = "<b*>Test" left_identifier = "<b*>" right_identifier = " </code></pre> i would want to create regex code which would result in the following code being run: <pre class="prettyprint"><code>re.findall('<b\*>(.*)<\/b>',text) </code></pre> It is the <code><b\*>(.*)<\/b></code> part that I don't know how to dynamically create.

You can do something like this: <pre class="prettyprint"><code>import re pattern_string = re.escape(left_identifier) + "(.*?)" + re.escape(right_identifier) pattern = re.compile(pattern_string) </code></pre> The escape function will automatically escape special characters. For eg: <pre class="prettyprint"><code>>>> import re >>> print re.escape("<b*>") \<b\*\> </code></pre>

Regex to extract between two strings (which are variables)

Q: What does (? I do in regex?

(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.

Q: What does \r represent in regex?

The \r metacharacter matches carriage return characters.

Tags:

python

regex

python-2.7

I am looking to use regex to extract text which occurs between two strings. I know how to do if i want to extract between the same strings every time (and countless questions asking for this e.g. Regex matching between two strings?), but I want to do it using variables which change, and may themselves include special characters within Regex. (i want any special characters, e.g. * treated as text).

For example if i had:

Click to copy

text = "<b*>Test</b>"
left_identifier = "<b*>"
right_identifier = "</b>

i would want to create regex code which would result in the following code being run:

Click to copy

re.findall('<b\*>(.*)<\/b>',text)

It is the <b\*>(.*)<\/b> part that I don't know how to dynamically create.

931

asked Apr 15 '15 17:04

kyrenia

1 Answers

You can do something like this:

Click to copy

import re
pattern_string = re.escape(left_identifier) + "(.*?)" + re.escape(right_identifier)
pattern = re.compile(pattern_string)

The escape function will automatically escape special characters. For eg:

Click to copy

>>> import re
>>> print re.escape("<b*>")
\<b\*\>

answered Oct 09 '22 05:10

Alexandru Chirila

Related questions
                            
                                Numpy array, insert alternate rows of zeros
                            
                                Convert int to 16 bit unsigned short
                            
                                Scipy - find bases of column space of matrix
                            
                                getting socket id of a client in flask socket.io
                            
                                Read a File from redirected stdin with python
                            
                                How do I create a dictionary from a string returning the number of characters [duplicate]
                            
                                error installing nltk supporting packages : nltk.download()
                            
                                Python - find out how much of an image is black
                            
                                How to create a Python script to automate software installation? [closed]
                            
                                Custom exceptions are not raised properly when used in Multiprocessing Pool
                            
                                How to run a Python unit test with the Atom editor?
                            
                                Assert mocked function called with json string in python
                            
                                UnicodeDecodeError when logging an Exception in Python
                            
                                Python subclassing process with initialiser
                            
                                Pandas with different length arrays
                            
                                Installing pygame module in anaconda mac
                            
                                Why my lambdas do not work? [duplicate]
                            
                                How to group by multiple keys in spark?
                            
                                Querying a django model using a model name string input
                            
                                Get minimum point(s) of numpy.poly1d curve

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex to extract between two strings (which are variables)

Tags:

python

regex

python-2.7

kyrenia

People also ask

1 Answers

Alexandru Chirila

Recent Activity

Donate For Us