Long story short, I have two regex patterns. One pattern matches things that I want to replace, and the other pattern matches a special case of those patterns that should not be replace. For a simple example, imagine that the first one is "\{.*\}" and the second one is "\{\{.*\}\}". Then "{this}" should be replaced, but "{{this}}" should not. Is there an easy way to take a string and say "substitute all instances of the first string with "hello" so long as it is not matching the second string"?
In other words, is there a way to make a regex that is "matches the first string but not the second" easily without modifying the first string? I know that I could modify my first regex by hand to never match instances of the second, but as the first regex gets more complex, that gets very difficult.
sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.
To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.
To perform a substitution, you use the Replace method of the Regex class, instead of the Match method that we've seen in earlier articles. This method is similar to Match, except that it includes an extra string parameter to receive the replacement value.
Using negative look-ahead/behind assertion
pattern = re.compile( "(?<!\{)\{(?!\{).*?(?<!\})\}(?!\})" )
pattern.sub( "hello", input_string )
Negative look-ahead/behind assertion allows you to compare against more of the string, but is not considered as using up part of the string for the match. There is also a normal look ahead/behind assertion which will cause the string to match only if the string IS followed/preceded by the given pattern.
That's a bit confusing looking, here it is in pieces:
"(?<!\{)" #Not preceded by a {
"\{" #A {
"(?!\{)" #Not followed by a {
".*?" #Any character(s) (non-greedy)
"(?<!\})" #Not preceded by a } (in reference to the next character)
"\}" #A }
"(?!\})" #Not followed by a }
So, we're looking for a { without any other {'s around it, followed by some characters, followed by a } without any other }'s around it.
By using negative look-ahead/behind assertion, we condense it down to a single regular expression which will successfully match only single {}'s anywhere in the string.
Also, note that * is a greedy operator. It will match as much as it possibly can. If you use "\{.*\}"
and there is more than one {} block in the text, everything between will be taken with it.
"This is some example text {block1} more text, watch me disappear {block2} even more text"
becomes
"This is some example text hello even more text"
instead of
"This is some example text hello more text, watch me disappear hello even more text"
To get the proper output we need to make it non-greedy by appending a ?.
The python docs do a good job of presenting the re library, but the only way to really learn is to experiment.
You can give replace a function (reference)
But make sure the first regex contain the second one. This is just an example:
regex1 = re.compile('\{.*\}')
regex2 = re.compile('\{\{.*\}\}')
def replace(match):
match = match.group(0)
if regex2.match(match):
return match
return 'replacement'
regex1.sub(replace, data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With