Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to replace substrings in a string given list of indices

Tags:

python

string

I have a string:

"A XYZ B XYZ C"

and a list of index-tuples:

((2, 5), (8, 11))

I would like to apply a replacement of each substring defined by indices by the sum of them:

A 7 B 19 C

I can't do it using string replace as it will match both instances of XYZ. Replacing using index information will break on the second and forth iterations as indices are shifting throughout the process.

Is there a nice solution for the problem?

UPDATE. String is given for example. I don't know its contents a priori nor can I use them in the solution.

My dirty solution is:

text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))

offset = 0
for rpl in replace_list:
    l = rpl[0] + offset
    r = rpl[1] + offset

    replacement = str(r + l)
    text = text[0:l] + replacement + text[r:]

    offset += len(replacement) - (r - l)

Which counts on the order of index-tuples to be ascending. Could it be done nicer?

like image 808
Denis Kulagin Avatar asked Jul 27 '17 11:07

Denis Kulagin


4 Answers

Imperative and stateful:

s = 'A XYZ B XYZ C'
indices = ((2, 5), (8, 11))
res = []
i = 0
for start, end in indices:
    res.append(s[i:start] + str(start + end))
    i = end
res.append(s[end:])
print(''.join(res))

Result:

A 7 B 19 C
like image 76
Mike Müller Avatar answered Sep 25 '22 13:09

Mike Müller


You can use re.sub():

In [17]: s = "A XYZ B XYZ C"

In [18]: ind = ((2, 5), (8, 11))

In [19]: inds = map(sum, ind)

In [20]: re.sub(r'XYZ', lambda _: str(next(inds)), s)
Out[20]: 'A 7 B 19 C'

But note that if the number of matches is larger than your index pairs it will raise a StopIteration error. In that case you can pass a default argument to the next() to replace the sub-string with.

If you want to use the tuples of indices for finding the sub strings, here is another solution:

In [81]: flat_ind = tuple(i for sub in ind for i in sub)
# Create all the pairs with respect to your intended indices. 
In [82]: inds = [(0, ind[0][0]), *zip(flat_ind, flat_ind[1:]), (ind[-1][-1], len(s))]
# replace the respective slice of the string with sum of indices of they exist in intended pairs, otherwise just the sub-string itself.
In [85]: ''.join([str(i+j) if (i, j) in ind else s[i:j] for i, j in inds])
Out[85]: 'A 7 B 19 C'
like image 34
Mazdak Avatar answered Sep 25 '22 13:09

Mazdak


One way to do this using itertools.groupby.

from itertools import groupby


indices = ((2, 5), (8, 11))
data = list("A XYZ B XYZ C")

We start with replacing the range of matched items with equal number of None.

for a, b in indices:
    data[a:b] = [None] * (b - a)

print(data)
# ['A', ' ', None, None, None, ' ', 'B', ' ', None, None, None, ' ', 'C']

The we loop over the grouped data and replace the None groups with the sum from indices list.

it = iter(indices)
output = []
for k, g in groupby(data, lambda x: x is not None):
    if k:
        output.extend(g)
    else:
        output.append(str(sum(next(it))))

print(''.join(output))
# A 7 B 19 C
like image 45
Ashwini Chaudhary Avatar answered Sep 22 '22 13:09

Ashwini Chaudhary


Here's a quick and slightly dirty solution using string formatting and tuple unpacking:

s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
totals = (sum(r) for r in reps)
print s.replace('XYZ','{}').format(*totals)

This prints:

A 7 B 19 C

First, we use a generator expression to find the totals for each of our replacements. Then, by replacing 'XYZ' with '{}' we can use string formatting - *totals will ensure we get the totals in the correct order.

Edit

I didn't realise the indices were actually string indices - my bad. To do this, we could use re.sub as follows:

import re
s = 'A XYZ B XYZ C'

reps = ((2, 5), (8, 11))
for a, b in reps:
    s = s[:a] + '~'*(b-a) + s[b:]
totals = (sum(r) for r in reps)
print re.sub(r'(~+)', r'{}', s).format(*totals)

Assuming there are no tildes (~) used in your string - if there are, replace with a different character. This also assumes none of the "replacement" groups are consecutive.

like image 33
asongtoruin Avatar answered Sep 23 '22 13:09

asongtoruin