Is there any recommended way to do multiple string substitutions other than doing <code>replace</code> chaining on a string (i.e. <code>text.replace(a, b).replace(c, d).replace(e, f)...</code>)? How would you, for example, implement a fast function that behaves like PHP's <code>htmlspecialchars</code> in Python? I compared (1) multiple <code>replace</code> method, (2) the regular expression method, and (3) Matt Anderson's method. With n=10 runs, the results came up as follows: On 100 characters: <pre class="prettyprint"> TIME: 0 ms [ replace_method(str) ] TIME: 5 ms [ regular_expression_method(str, dict) ] TIME: 1 ms [ matts_multi_replace_method(list, str) ] </pre> On 1000 characters: <pre class="prettyprint"> TIME: 0 ms [ replace_method(str) ] TIME: 3 ms [ regular_expression_method(str, dict) ] TIME: 2 ms [ matts_multi_replace_method(list, str) ] </pre> On 10000 characters: <pre class="prettyprint"> TIME: 3 ms [ replace_method(str) ] TIME: 7 ms [ regular_expression_method(str, dict) ] TIME: 5 ms [ matts_multi_replace_method(list, str) ] </pre> On 100000 characters: <pre class="prettyprint"> TIME: 36 ms [ replace_method(str) ] TIME: 46 ms [ regular_expression_method(str, dict) ] TIME: 39 ms [ matts_multi_replace_method(list, str) ] </pre> On 1000000 characters: <pre class="prettyprint"> TIME: 318 ms [ replace_method(str) ] TIME: 360 ms [ regular_expression_method(str, dict) ] TIME: 320 ms [ matts_multi_replace_method(list, str) ] </pre> On 3687809 characters: <pre class="prettyprint"> TIME: 1.277524 sec [ replace_method(str) ] TIME: 1.290590 sec [ regular_expression_method(str, dict) ] TIME: 1.116601 sec [ matts_multi_replace_method(list, str) ] </pre> So kudos to Matt for beating the multi <code>replace</code> method on a fairly large input string. Anyone got ideas for beating it on a smaller string?

Something like the following maybe? Split the text into pieces with the first "from" item to be replaced, then recursively split each of those parts into sub-parts with the next "from" item to be replaced, and so on, until you've visited all your replacements. Then join with the "to" replacement item for each as recursive function completes. A little hard to wrap your head around the following code perhaps (it was for me, and I wrote it), but it seems to function as intended. I didn't benchmark it, but I suspect it would be reasonably fast. <pre class="prettyprint"><code>def multi_replace(pairs, text): stack = list(pairs) stack.reverse() def replace(stack, parts): if not stack: return parts # copy the stack so I don't disturb parallel recursions stack = list(stack) from_, to = stack.pop() #print 'split (%r=>%r)' % (from_, to), parts split_parts = [replace(stack, part.split(from_)) for part in parts] parts = [to.join(split_subparts) for split_subparts in split_parts] #print 'join (%r=>%r)' % (from_, to), parts return parts return replace(stack, [text])[0] print multi_replace( [('foo', 'bar'), ('baaz', 'foo'), ('quux', 'moop')], 'foobarbaazfooquuxquux') </code></pre> for: <pre class="prettyprint"><code>barbarfoobarmoopmoop </code></pre>

Normally, <code>.replace</code> method beats all other methods. (See my benchmarks above.)

Fastest implementation to do multiple string substitutions in Python

Is there any recommended way to do multiple string substitutions other than doing replace chaining on a string (i.e. text.replace(a, b).replace(c, d).replace(e, f)...)? How would you, for example, implement a fast function that behaves like PHP's htmlspecialchars in Python?

I compared (1) multiple replace method, (2) the regular expression method, and (3) Matt Anderson's method.

With n=10 runs, the results came up as follows:

On 100 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 5 ms [ regular_expression_method(str, dict) ]
TIME: 1 ms [ matts_multi_replace_method(list, str) ]

On 1000 characters:

TIME: 0 ms [ replace_method(str) ]
TIME: 3 ms [ regular_expression_method(str, dict) ]
TIME: 2 ms [ matts_multi_replace_method(list, str) ]

On 10000 characters:

TIME: 3 ms [ replace_method(str) ]
TIME: 7 ms [ regular_expression_method(str, dict) ]
TIME: 5 ms [ matts_multi_replace_method(list, str) ]

On 100000 characters:

TIME: 36 ms [ replace_method(str) ]
TIME: 46 ms [ regular_expression_method(str, dict) ]
TIME: 39 ms [ matts_multi_replace_method(list, str) ]

On 1000000 characters:

TIME: 318 ms [ replace_method(str) ]
TIME: 360 ms [ regular_expression_method(str, dict) ]
TIME: 320 ms [ matts_multi_replace_method(list, str) ]

On 3687809 characters:

TIME: 1.277524 sec [ replace_method(str) ]
TIME: 1.290590 sec [ regular_expression_method(str, dict) ]
TIME: 1.116601 sec [ matts_multi_replace_method(list, str) ]

So kudos to Matt for beating the multi replace method on a fairly large input string.

Anyone got ideas for beating it on a smaller string?

How do you do multiple replaces in Python?

Method 2: Replace multiple characters using translate() + maketrans() There is also a dedication function that can perform this type of replacement task in a single line hence this is a recommended way to solve this particular problem.

How do you replace multiple words in a string in Python?

Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.

How do you replace multiple elements in a list in Python?

One way that we can do this is by using a for loop. One of the key attributes of Python lists is that they can contain duplicate values. Because of this, we can loop over each item in the list and check its value. If the value is one we want to replace, then we replace it.

Something like the following maybe? Split the text into pieces with the first "from" item to be replaced, then recursively split each of those parts into sub-parts with the next "from" item to be replaced, and so on, until you've visited all your replacements. Then join with the "to" replacement item for each as recursive function completes.

A little hard to wrap your head around the following code perhaps (it was for me, and I wrote it), but it seems to function as intended. I didn't benchmark it, but I suspect it would be reasonably fast.

def multi_replace(pairs, text):
    stack = list(pairs)
    stack.reverse()
    def replace(stack, parts):
        if not stack:
            return parts
        # copy the stack so I don't disturb parallel recursions
        stack = list(stack) 
        from_, to = stack.pop()
        #print 'split (%r=>%r)' % (from_, to), parts
        split_parts = [replace(stack, part.split(from_)) for part in parts]
        parts = [to.join(split_subparts) for split_subparts in split_parts]
        #print 'join (%r=>%r)' % (from_, to), parts
        return parts
    return replace(stack, [text])[0]


print multi_replace(
    [('foo', 'bar'), ('baaz', 'foo'), ('quux', 'moop')], 
    'foobarbaazfooquuxquux')

for:

barbarfoobarmoopmoop

Normally, .replace method beats all other methods. (See my benchmarks above.)

Fastest implementation to do multiple string substitutions in Python

Tags:

python

string

php

OTZ

People also ask

2 Answers

Matt Anderson

OTZ

Recent Activity

Donate For Us

Fastest implementation to do multiple string substitutions in Python

Tags:

python

string

php

OTZ

People also ask

2 Answers

Matt Anderson

OTZ

Related questions

Recent Activity

Donate For Us