Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace named captured groups with arbitrary values in Python

I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.

I have a string like this one :

s = 'monthday=1, month=5, year=2018'

and I have a regex matching it with captured groups like the following :

regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')

now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :

'monthday=aaa, month=bbb, year=ccc'

basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.

Is there a way to achieve the desired result ?

Note

This is just an example, I could have other input regexs with different structure, but same name capturing groups ...

Update

Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'

as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :

'ccc-bbb-aaa'

replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.

In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.

like image 922
aleroot Avatar asked Nov 20 '17 16:11

aleroot


3 Answers

This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.

Since you've written your regex the wrong way, you have to do most of the substitution operation manually:

"""
Replaces the text captured by named groups.
"""
def replace_groups(pattern, string, replacements):
    pattern = re.compile(pattern)
    # create a dict of {group_index: group_name} for use later
    groupnames = {index: name for name, index in pattern.groupindex.items()}

    def repl(match):
        # we have to split the matched text into chunks we want to keep and
        # chunks we want to replace
        # captured text will be replaced. uncaptured text will be kept.
        text = match.group()
        chunks = []
        lastindex = 0
        for i in range(1, pattern.groups+1):
            groupname = groupnames.get(i)
            if groupname not in replacements:
                continue

            # keep the text between this match and the last
            chunks.append(text[lastindex:match.start(i)])
            # then instead of the captured text, insert the replacement text for this group
            chunks.append(replacements[groupname])
            lastindex = match.end(i)
        chunks.append(text[lastindex:])
        # join all the junks to obtain the final string with replacements
        return ''.join(chunks)

    # for each occurence call our custom replacement function
    return re.sub(pattern, repl, string)
>>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
'monthday=aaa, month=bbb, year=ccc'
like image 110
Aran-Fey Avatar answered Oct 13 '22 18:10

Aran-Fey


You can use string formatting with a regex substitution:

import re
s = 'monthday=1, month=5, year=2018'
s = re.sub('(?<=\=)\d+', '{}', s).format(*['aaa', 'bbb', 'ccc'])

Output:

'monthday=aaa, month=bbb, year=ccc'

Edit: given an arbitrary input string and regex, you can use formatting like so:

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
new_s = re.sub(regex, '{}', input).format(*["aaa", "bbb", "ccc"])
like image 2
Ajax1234 Avatar answered Oct 13 '22 18:10

Ajax1234


Extended Python 3.x solution on extended example (re.sub() with replacement function):

import re

d = {'d':'aaa', 'm':'bbb', 'Y':'ccc'}  # predefined dict of replace words
pat = re.compile('(monthday=)(?P<d>\d{1,2})|(month=)(?P<m>\d{1,2})|(year=)(?P<Y>20\d{2})')

def repl(m):
    pair = next(t for t in m.groupdict().items() if t[1])
    k = next(filter(None, m.groups()))  # preceding `key` for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=')
    return k + d.get(pair[0], '')

s = 'Data: year=2018, monthday=1, month=5, some other text'
result = pat.sub(repl, s)

print(result)

The output:

Data: year=ccc, monthday=aaa, month=bbb, some other text

For Python 2.7 : change the line k = next(filter(None, m.groups())) to:

k = filter(None, m.groups())[0]
like image 2
RomanPerekhrest Avatar answered Oct 13 '22 20:10

RomanPerekhrest