Substituting a regex only when it doesn't match another regex (Python)

Tags:

Long story short, I have two regex patterns. One pattern matches things that I want to replace, and the other pattern matches a special case of those patterns that should not be replace. For a simple example, imagine that the first one is "\{.*\}" and the second one is "\{\{.*\}\}". Then "{this}" should be replaced, but "{{this}}" should not. Is there an easy way to take a string and say "substitute all instances of the first string with "hello" so long as it is not matching the second string"?

In other words, is there a way to make a regex that is "matches the first string but not the second" easily without modifying the first string? I know that I could modify my first regex by hand to never match instances of the second, but as the first regex gets more complex, that gets very difficult.

789

asked May 20 '09 16:05

So8res

2 Answers

Using negative look-ahead/behind assertion

pattern = re.compile( "(?<!\{)\{(?!\{).*?(?<!\})\}(?!\})" )
pattern.sub( "hello", input_string )

Negative look-ahead/behind assertion allows you to compare against more of the string, but is not considered as using up part of the string for the match. There is also a normal look ahead/behind assertion which will cause the string to match only if the string IS followed/preceded by the given pattern.

That's a bit confusing looking, here it is in pieces:

"(?<!\{)"  #Not preceded by a {
"\{"       #A {
"(?!\{)"   #Not followed by a {
".*?"      #Any character(s) (non-greedy)
"(?<!\})"  #Not preceded by a } (in reference to the next character)
"\}"       #A }
"(?!\})"   #Not followed by a }

So, we're looking for a { without any other {'s around it, followed by some characters, followed by a } without any other }'s around it.

By using negative look-ahead/behind assertion, we condense it down to a single regular expression which will successfully match only single {}'s anywhere in the string.

Also, note that * is a greedy operator. It will match as much as it possibly can. If you use "\{.*\}" and there is more than one {} block in the text, everything between will be taken with it.

"This is some example text {block1} more text, watch me disappear {block2} even more text"

becomes

"This is some example text hello even more text"

instead of

"This is some example text hello more text, watch me disappear hello even more text"

To get the proper output we need to make it non-greedy by appending a ?.

The python docs do a good job of presenting the re library, but the only way to really learn is to experiment.

104

answered Oct 16 '22 02:10

spbogie

You can give replace a function (reference)

But make sure the first regex contain the second one. This is just an example:

regex1 = re.compile('\{.*\}')
regex2 = re.compile('\{\{.*\}\}')

def replace(match):
    match = match.group(0)
    if regex2.match(match):
        return match
    return 'replacement'


regex1.sub(replace, data)

answered Oct 16 '22 02:10

Nadia Alramli

Related questions
                            
                                Could not find module (or one of its dependencies). Try using the full path with constructor syntax
                            
                                Set all markers to the same fixed size in Plotly Express scatterplot
                            
                                Calling class method inside string format
                            
                                Is there a way to get the number of occurrences of the last value in a groupby?
                            
                                Sorting by data in another Dataframe
                            
                                textcat -> architecture extra fields not permitted
                            
                                How to programmatically ensure that a function includes a return statement?
                            
                                How to extract country from a string in python
                            
                                Sort pandas df subset of rows (within a group) by specific column
                            
                                Similar random number generation in python and c++ but getting different output
                            
                                import error 'force_text' from 'django.utils.encoding'
                            
                                Significant figures in the decimal module
                            
                                What is the best way to change text contained in an XML file using Python?
                            
                                Filtering models with ReferenceProperties
                            
                                Is there a way in python to apply a list of regex patterns that are stored in a list to a single string?
                            
                                XMP image tagging and Python
                            
                                Passing self to class functions in Python [duplicate]
                            
                                Python server side AJAX library?
                            
                                Efficient way of Solving Cryptarithms
                            
                                Is site-packages appropriate for applications or just libraries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Substituting a regex only when it doesn't match another regex (Python)

Tags:

python

regex

So8res

People also ask

2 Answers

spbogie

Nadia Alramli

Recent Activity

Donate For Us