Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mass string replace in python?

Say I have a string that looks like this:

str = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog" 

You'll notice a lot of locations in the string where there is an ampersand, followed by a character (such as "&y" and "&c"). I need to replace these characters with an appropriate value that I have in a dictionary, like so:

dict = {"&y":"\033[0;30m",         "&c":"\033[0;31m",         "&b":"\033[0;32m",         "&Y":"\033[0;33m",         "&u":"\033[0;34m"} 

What is the fastest way to do this? I could manually find all the ampersands, then loop through the dictionary to change them, but that seems slow. Doing a bunch of regex replaces seems slow as well (I will have a dictionary of about 30-40 pairs in my actual code).

Any suggestions are appreciated, thanks.

Edit:

As has been pointed out in comments throught this question, my dictionary is defined before runtime, and will never change during the course of the applications life cycle. It is a list of ANSI escape sequences, and will have about 40 items in it. My average string length to compare against will be about 500 characters, but there will be ones that are up to 5000 characters (although, these will be rare). I am also using Python 2.6 currently.

Edit #2 I accepted Tor Valamos answer as the correct one, as it not only gave a valid solution (although it wasn't the best solution), but took all others into account and did a tremendous amount of work to compare all of them. That answer is one of the best, most helpful answers I have ever come across on StackOverflow. Kudos to you.

like image 786
Mike Trpcic Avatar asked Dec 17 '09 02:12

Mike Trpcic


People also ask

Can you replace multiple strings in Python?

Replace multiple different characters: translate()Use the translate() method to replace multiple different characters. You can create the translation table specified in translate() by the str. maketrans() . Specify a dictionary whose key is the old character and whose value is the new string in the str.

How do you replace multiple characters in a string in Python?

A character in Python is also a string. So, we can use the replace() method to replace multiple characters in a string. It replaced all the occurrences of, Character 's' with 'X'.

How do you replace all strings in Python?

Python String | replace() replace() is an inbuilt function in the Python programming language that returns a copy of the string where all occurrences of a substring are replaced with another substring.

Can you replace a string in Python?

replace() Python method, you are able to replace every instance of one specific character with a new one. You can even replace a whole string of text with a new line of text that you specify. The . replace() method returns a copy of a string.


1 Answers

mydict = {"&y":"\033[0;30m",           "&c":"\033[0;31m",           "&b":"\033[0;32m",           "&Y":"\033[0;33m",           "&u":"\033[0;34m"} mystr = "The &yquick &cbrown &bfox &Yjumps over the &ulazy dog"  for k, v in mydict.iteritems():     mystr = mystr.replace(k, v)  print mystr The ←[0;30mquick ←[0;31mbrown ←[0;32mfox ←[0;33mjumps over the ←[0;34mlazy dog 

I took the liberty of comparing a few solutions:

mydict = dict([('&' + chr(i), str(i)) for i in list(range(65, 91)) + list(range(97, 123))])  # random inserts between keys from random import randint rawstr = ''.join(mydict.keys()) mystr = '' for i in range(0, len(rawstr), 2):     mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars  from time import time  # How many times to run each solution rep = 10000  print 'Running %d times with string length %d and ' \       'random inserts of lengths 0-20' % (rep, len(mystr))  # My solution t = time() for x in range(rep):     for k, v in mydict.items():         mystr.replace(k, v)     #print(mystr) print '%-30s' % 'Tor fixed & variable dict', time()-t  from re import sub, compile, escape  # Peter Hansen t = time() for x in range(rep):     sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict print '%-30s' % 'Peter fixed & variable dict', time()-t  # Claudiu def multiple_replace(dict, text):      # Create a regular expression  from the dictionary keys     regex = compile("(%s)" % "|".join(map(escape, dict.keys())))      # For each match, look-up corresponding value in dictionary     return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)  t = time() for x in range(rep):     multiple_replace(mydict, mystr) print '%-30s' % 'Claudio variable dict', time()-t  # Claudiu - Precompiled regex = compile("(%s)" % "|".join(map(escape, mydict.keys())))  t = time() for x in range(rep):     regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr) print '%-30s' % 'Claudio fixed dict', time()-t  # Andrew Y - variable dict def mysubst(somestr, somedict):   subs = somestr.split("&")   return subs[0] + "".join(map(lambda arg: somedict["&" + arg[0:1]] + arg[1:], subs[1:]))  t = time() for x in range(rep):     mysubst(mystr, mydict) print '%-30s' % 'Andrew Y variable dict', time()-t  # Andrew Y - fixed def repl(s):   return mydict["&"+s[0:1]] + s[1:]  t = time() for x in range(rep):     subs = mystr.split("&")     res = subs[0] + "".join(map(repl, subs[1:])) print '%-30s' % 'Andrew Y fixed dict', time()-t 

Results in Python 2.6

Running 10000 times with string length 490 and random inserts of lengths 0-20 Tor fixed & variable dict      1.04699993134 Peter fixed & variable dict    0.218999862671 Claudio variable dict          2.48400020599 Claudio fixed dict             0.0940001010895 Andrew Y variable dict         0.0309998989105 Andrew Y fixed dict            0.0310001373291 

Both claudiu's and andrew's solutions kept going into 0, so I had to increase it to 10 000 runs.

I ran it in Python 3 (because of unicode) with replacements of chars from 39 to 1024 (38 is ampersand, so I didn't wanna include it). String length up to 10.000 including about 980 replacements with variable random inserts of length 0-20. The unicode values from 39 to 1024 causes characters of both 1 and 2 bytes length, which could affect some solutions.

mydict = dict([('&' + chr(i), str(i)) for i in range(39,1024)])  # random inserts between keys from random import randint rawstr = ''.join(mydict.keys()) mystr = '' for i in range(0, len(rawstr), 2):     mystr += chr(randint(65,91)) * randint(0,20) # insert between 0 and 20 chars  from time import time  # How many times to run each solution rep = 10000  print('Running %d times with string length %d and ' \       'random inserts of lengths 0-20' % (rep, len(mystr)))  # Tor Valamo - too long #t = time() #for x in range(rep): #    for k, v in mydict.items(): #        mystr.replace(k, v) #print('%-30s' % 'Tor fixed & variable dict', time()-t)  from re import sub, compile, escape  # Peter Hansen t = time() for x in range(rep):     sub(r'(&[a-zA-Z])', r'%(\1)s', mystr) % mydict print('%-30s' % 'Peter fixed & variable dict', time()-t)  # Peter 2 def dictsub(m):     return mydict[m.group()]  t = time() for x in range(rep):     sub(r'(&[a-zA-Z])', dictsub, mystr) print('%-30s' % 'Peter fixed dict', time()-t)  # Claudiu - too long #def multiple_replace(dict, text):  #    # Create a regular expression  from the dictionary keys #    regex = compile("(%s)" % "|".join(map(escape, dict.keys()))) # #    # For each match, look-up corresponding value in dictionary #    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) # #t = time() #for x in range(rep): #    multiple_replace(mydict, mystr) #print('%-30s' % 'Claudio variable dict', time()-t)  # Claudiu - Precompiled regex = compile("(%s)" % "|".join(map(escape, mydict.keys())))  t = time() for x in range(rep):     regex.sub(lambda mo: mydict[mo.string[mo.start():mo.end()]], mystr) print('%-30s' % 'Claudio fixed dict', time()-t)  # Separate setup for Andrew and gnibbler optimized dict mydict = dict((k[1], v) for k, v in mydict.items())  # Andrew Y - variable dict def mysubst(somestr, somedict):   subs = somestr.split("&")   return subs[0] + "".join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:]))  def mysubst2(somestr, somedict):   subs = somestr.split("&")   return subs[0].join(map(lambda arg: somedict[arg[0:1]] + arg[1:], subs[1:]))  t = time() for x in range(rep):     mysubst(mystr, mydict) print('%-30s' % 'Andrew Y variable dict', time()-t) t = time() for x in range(rep):     mysubst2(mystr, mydict) print('%-30s' % 'Andrew Y variable dict 2', time()-t)  # Andrew Y - fixed def repl(s):   return mydict[s[0:1]] + s[1:]  t = time() for x in range(rep):     subs = mystr.split("&")     res = subs[0] + "".join(map(repl, subs[1:])) print('%-30s' % 'Andrew Y fixed dict', time()-t)  # gnibbler t = time() for x in range(rep):     myparts = mystr.split("&")     myparts[1:]=[mydict[x[0]]+x[1:] for x in myparts[1:]]     "".join(myparts) print('%-30s' % 'gnibbler fixed & variable dict', time()-t) 

Results:

Running 10000 times with string length 9491 and random inserts of lengths 0-20 Tor fixed & variable dict      0.0 # disqualified 329 secs Peter fixed & variable dict    2.07799983025 Peter fixed dict               1.53100013733  Claudio variable dict          0.0 # disqualified, 37 secs Claudio fixed dict             1.5 Andrew Y variable dict         0.578000068665 Andrew Y variable dict 2       0.56299996376 Andrew Y fixed dict            0.56200003624 gnibbler fixed & variable dict 0.530999898911 

(** Note that gnibbler's code uses a different dict, where keys don't have the '&' included. Andrew's code also uses this alternate dict, but it didn't make much of a difference, maybe just 0.01x speedup.)

like image 86
10 revs Avatar answered Oct 06 '22 21:10

10 revs