Let's say I have a string like this:
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
and I want to turn it into
'(xy09 and foobar or (abc123 and something))'
then - in this particular case - I could simply do
s.replace('X_', "")
which gives the desired output.
However, in my actual data there might be not only X_ but also other prefixes, so the above replace statement does not work.
What I would need instead is a replacement of
a capital letter followed by an underscore and an arbitrary sequence of letters and numbers
by
everything after the first underscore.
So, to extract the desired elements I could use:
import re
print(re.findall('[A-Z]{1}_[a-zA-Z0-9]+', s))
which prints
['X_xy09', 'X_foobar', 'X_abc123', 'X_something']
how can I now replace those elements so that I obtain
'(xy09 and foobar or (abc123 and something))'
?
If you need to remove an uppercase ASCII letter with an underscore after it, only when not preceded with a word char and when followed with an alphanumeric char, you may use
import re
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
print(re.sub(r'\b[A-Z]_([a-zA-Z0-9])', r'\1', s))
See the Python demo and a regex demo.
Pattern details
\b - a leading word boundary [A-Z]_ - an ASCII uppercase letter and _([a-zA-Z0-9]) - Group 1 (later referenced to with \1 from the replacement pattern): 1 alphanumeric char.If you just need to replace a capital letter followed by an underscore, you can use the regular expression r'[A-Z]_'.
s = '(X_xy09 and X_foobar or (X_abc123 and X_something))'
re.sub(r'[A-Z]_', '', s)
You may need to add to it if you have other criteria not mentioned. (For example, some of your target values follow a word boundary and some follow parentheses.) The above might give you the wrong output if you have input like XY_something. It depends on what you expect the output to be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With