I would like a Python regular expression that matches a given word that's not between simple quotes. I've tried to use the (?! ...) but without success.
In the following screenshot, I would like to match all foe except the one in the 4th line.
Plus, the text is given as one big string.
Here is the link regex101 and the sample text is below:
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe
A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.
A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.
Here is a sample Python demo:
import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)
See the Python demo
The regex will look like
('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b
See the regex demo.
The ('[^'\\]*(?:\\.[^'\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b matches whole words foe in any other string context - and subsequently is replaced with another word.
NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")".
You can try this:-
((?!\'[\w\s]*)foe(?![\w\s]*\'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With