This is sort of a follow-up to Python regex - Replace single quotes and brackets thread.
The task:
Sample input strings:
RSQ(name['BAKD DK'], name['A DKJ']) SMT(name['BAKD DK'], name['A DKJ'], name['S QRT'])
Desired outputs:
XYZ(BAKD DK, A DKJ) XYZ(BAKD DK, A DKJ, S QRT)
The number of name['something']
-like items is variable.
The current solution:
Currently, I'm doing it through two separate re.sub()
calls:
>>> import re >>> >>> s = "RSQ(name['BAKD DK'], name['A DKJ'])" >>> s1 = re.sub(r"^(\w+)", "XYZ", s) >>> re.sub(r"name\['(.*?)'\]", r"\1", s1) 'XYZ(BAKD DK, A DKJ)'
The question:
Would it be possible to combine these two re.sub()
calls into a single one?
In other words, I want to replace something at the beginning of the string and then multiple similar things after, all of that in one go.
I've looked into regex
module - it's ability to capture repeated patterns looks very promising, tried using regex.subf()
but failed to make it work.
You can indeed use the regex module and repeated captures. The main interest is that you can check the structure of the matched string:
import regex regO = regex.compile(r''' \w+ \( (?: name\['([^']*)'] (?: ,[ ] | (?=\)) ) )* \) ''', regex.VERBOSE); regO.sub(lambda m: 'XYZ(' + (', '.join(m.captures(1))) + ')', s)
(Note that you can replace "name"
by \w+
or anything you want without problems.)
Please do not do this in any code I have to maintain.
You are trying to parse syntactically valid Python. Use ast
for that. It's more readable, easier to extend to new syntax, and won't fall apart on some weird corner case.
Working sample:
from ast import parse l = [ "RSQ(name['BAKD DK'], name['A DKJ'])", "SMT(name['BAKD DK'], name['A DKJ'], name['S QRT'])" ] for item in l: tree = parse(item) args = [arg.slice.value.s for arg in tree.body[0].value.args] output = "XYZ({})".format(", ".join(args)) print(output)
Prints:
XYZ(BAKD DK, A DKJ) XYZ(BAKD DK, A DKJ, S QRT)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With