Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing repeated captures

Tags:

This is sort of a follow-up to Python regex - Replace single quotes and brackets thread.

The task:

Sample input strings:

RSQ(name['BAKD DK'], name['A DKJ']) SMT(name['BAKD DK'], name['A DKJ'], name['S QRT']) 

Desired outputs:

XYZ(BAKD DK, A DKJ) XYZ(BAKD DK, A DKJ, S QRT) 

The number of name['something']-like items is variable.

The current solution:

Currently, I'm doing it through two separate re.sub() calls:

>>> import re >>> >>> s = "RSQ(name['BAKD DK'], name['A DKJ'])" >>> s1 = re.sub(r"^(\w+)", "XYZ", s) >>> re.sub(r"name\['(.*?)'\]", r"\1", s1) 'XYZ(BAKD DK, A DKJ)' 

The question:

Would it be possible to combine these two re.sub() calls into a single one?

In other words, I want to replace something at the beginning of the string and then multiple similar things after, all of that in one go.


I've looked into regex module - it's ability to capture repeated patterns looks very promising, tried using regex.subf() but failed to make it work.

like image 913
alecxe Avatar asked May 23 '16 01:05

alecxe


2 Answers

You can indeed use the regex module and repeated captures. The main interest is that you can check the structure of the matched string:

import regex  regO = regex.compile(r'''     \w+ \( (?: name\['([^']*)'] (?: ,[ ] | (?=\)) ) )* \)     ''', regex.VERBOSE);  regO.sub(lambda m: 'XYZ(' + (', '.join(m.captures(1))) + ')', s) 

(Note that you can replace "name" by \w+ or anything you want without problems.)

like image 192
Casimir et Hippolyte Avatar answered Sep 23 '22 07:09

Casimir et Hippolyte


Please do not do this in any code I have to maintain.

You are trying to parse syntactically valid Python. Use ast for that. It's more readable, easier to extend to new syntax, and won't fall apart on some weird corner case.

Working sample:

from ast import parse  l = [     "RSQ(name['BAKD DK'], name['A DKJ'])",     "SMT(name['BAKD DK'], name['A DKJ'], name['S QRT'])" ]  for item in l:     tree = parse(item)     args = [arg.slice.value.s for arg in tree.body[0].value.args]      output = "XYZ({})".format(", ".join(args))     print(output) 

Prints:

XYZ(BAKD DK, A DKJ) XYZ(BAKD DK, A DKJ, S QRT) 
like image 30
Kevin Avatar answered Sep 23 '22 07:09

Kevin