I am trying to remove a substring using regex in Python.
The substring could be the entire string, at the beginning, in the middle, or at the end.
The goal is that the resulting string should not have extra spaces where the substring existed.
Could you suggest a simple and efficient regex that achieves this?
Here are examples of the scenarios*, and my expected results:
'before remove after' --> 'before after' (separated by single space)
'remove after' --> 'after' (no space)
'before remove' --> 'before' (no space)
'remove' --> '' (no space, empty string)
* before, remove, and after may themselves internally contain any character (letters, numbers, spaces, etc.).
The regex should achieve the following:
Here are a couple of my attempts, but I could not get all scenarios to work...
import re
s1 = 'before remove after'
s2 = 'remove after'
s3 = 'before remove'
s4 = 'remove'
# (1) Just replace with empty string ''...
re.sub(r'remove', '', s1)
'before after' # <-- bad (two spaces in result)
re.sub(r'remove', '', s2)
' after' # <-- bad (space in the beginning)
re.sub(r'remove', '', s3)
'before ' # <-- bad (space at the end)
re.sub(r'remove', '', s4)
'' # <-- good (empty string)
# (2) Capture the "before" part excluding space suffixes,
# capture the "after" part excluding space prefixes,
# and recombine them with a single space...
re.sub(r'(.*?)\s*remove\s*(.*?)', '\\1 \\2', s1)
'before after' # <-- good (single space)
re.sub(r'(.*?)\s*remove\s*(.*?)', '\\1 \\2', s2)
' after' # <-- bad (space in the beginning)
re.sub(r'(.*?)\s*remove\s*(.*?)', '\\1 \\2', s3)
'before ' # <-- bad (space at the end)
re.sub(r'(.*?)\s*remove\s*(.*?)', '\\1 \\2', s4)
' ' # <-- bad (should be an empty string)
try this :
import re
s ='before remove after'
s1 = 'remove after'
s2 = 'before remove'
s3 = 'remove'
print(re.sub(r"(remove\s?)|(\sremove)","",s))
print(re.sub(r"(remove\s?)|(\sremove)","",s1))
print(re.sub(r"(remove\s?)|(\sremove)","",s2))
print(re.sub(r"(remove\s?)|(\sremove)","",s3))
demo
Using a pattern without a lambda, you could use a capturing group in the replacement. That group should contain either a single space when remove is surrounded by words, or an empty string when only remove surrounded by optional spaces.
(?:(?<=\S)( )+)? *remove *(?(1) (?=\S)(?!remove\b))
Explanation
(?: Non capture group
(?<=\S) Positive lookbehind, assert what is directly to the left is a non whitespace char( )+ Capture group 1, repeat 1+ times matching a space which captures only the value of the last iteration that we need in the replacement)? Close non capture group and make it optional *remove * Match remove between optional spaces(?(1) (?=\S)(?!remove\b) If clause, it group 1 exists, match a space asserting what is directly to the right is a non whitespace char but not the word removeRegex demo | Python demo
Example code
import re
strings = [
'before remove after',
'remove after',
' remove',
'remove ',
' remove ',
'before remove',
'remove',
'before remove after',
'before remove after remove before',
'before remove after remove before remove',
'before remove after remove before remove ',
'after remove before before remove remove remove',
'remove remove remove '
]
pattern = r"(?:(?<=\S)( )+)? *remove *(?(1) (?=\S)(?!remove\b))"
for s in strings:
print("'{0}' ==> '{1}'".format(s, re.sub(pattern, r"\1", s)))
Output (between single quotes to show the empty strings)
before remove after' ==> 'before after'
'remove after' ==> 'after'
' remove' ==> ''
'remove ' ==> ''
' remove ' ==> ''
'before remove' ==> 'before'
'remove' ==> ''
'before remove after' ==> 'before after'
'before remove after remove before' ==> 'before after before'
'before remove after remove before remove' ==> 'before after before'
'before remove after remove before remove ' ==> 'before after before'
'after remove before before remove remove remove' ==> 'after before before'
'remove remove remove ' ==> ''
Note
If you want to match whitespace chars that could possibly also match a newline, you can use \s instead of a space.
If you want to match whitespace chars without a newline instead of a space only, you can use [^\S\r\n]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With