Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying a dictionary of string replacements to a list of strings

Tags:

python

Say I have a list of strings and a dictionary specifying replacements:

E.g.

my_replacements = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}

and a list of strings, where each string can possibly include keys from the above dictionary, e.g:

['I own 1/2 bottle', 'Give me 3/4 of the profit']

How can I apply the replacements to the list? What would be a Pythonic way to do this?

like image 719
Josh Avatar asked Apr 28 '14 14:04

Josh


People also ask

How do I replace a string in a list?

Replace a specific string in a list. If you want to replace the string of elements of a list, use the string method replace() for each element with the list comprehension. If there is no string to be replaced, applying replace() will not change it, so you don't need to select an element with if condition .

How do you replace all occurrences in a list in Python?

In Python, we can replace all occurrences of a character in a string using the following methods: replace() re. sub()


2 Answers

a = ['I own 1/2 bottle', 'Give me 3/4 of the profit']
b = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}

def replace(x):
    for what, new in b.items(): # or iteritems in Python 2
        x = x.replace(what, new)
    return x

print(list(map(replace, a)))

Output:

['I own half bottle', 'Give me three quarters of the profit']
like image 32
vaultah Avatar answered Oct 04 '22 12:10

vaultah


O(n) solution:

reps = {'1/2': 'half', '1/4': 'quarter', '3/4': 'three quarters'}
li = ['I own 1/2 bottle', 'Give me 3/4 of the profit']

map(lambda s: ' '.join([reps.get(w,w) for w in s.split()]),li)
Out[6]: ['I own half bottle', 'Give me three quarters of the profit']

#for those who don't like `map`, the list comp version:
[' '.join([reps.get(w,w) for w in sentence.split()]) for sentence in li]
Out[9]: ['I own half bottle', 'Give me three quarters of the profit']

The issue with making lots of replace calls in a loop is that it makes your algorithm O(n**2). Not a big deal when you have a replacement dict of length 3, but when it gets large, suddenly you have a really slow algorithm that doesn't need to be.

As noted in comments, this approach fundamentally depends on being able to tokenize based on spaces - thus, if you have any whitespace in your replacement keys (say, you want to replace a series of words) this approach will not work. However being able to replace only-words is a far more frequent operation than needing to replace groupings-of-words, so I disagree with the commenters who believe that this approach isn't generic enough.

like image 194
roippi Avatar answered Oct 04 '22 10:10

roippi