How can I do multiple substitutions using regex?

Tags:

I can use this code below to create a new file with the substitution of a with aa using regular expressions.

import re  with open("notes.txt") as text:     new_text = re.sub("a", "aa", text.read())     with open("notes2.txt", "w") as result:         result.write(new_text)

I was wondering do I have to use this line, new_text = re.sub("a", "aa", text.read()), multiple times but substitute the string for others letters that I want to change in order to change more than one letter in my text?

That is, so a-->aa,b--> bb and c--> cc.

So I have to write that line for all the letters I want to change or is there an easier way. Perhaps to create a "dictionary" of translations. Should I put those letters into an array? I'm not sure how to call on them if I do.

937

asked Mar 02 '13 13:03

Euridice01

2 Answers

The answer proposed by @nhahtdh is valid, but I would argue less pythonic than the canonical example, which uses code less opaque than his regex manipulations and takes advantage of python's built-in data structures and anonymous function feature.

A dictionary of translations makes sense in this context. In fact, that's how the Python Cookbook does it, as shown in this example (copied from ActiveState http://code.activestate.com/recipes/81330-single-pass-multiple-replace/ )

import re   def multiple_replace(dict, text):   # Create a regular expression  from the dictionary keys   regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))    # For each match, look-up corresponding value in dictionary   return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)   if __name__ == "__main__":     text = "Larry Wall is the creator of Perl"    dict = {     "Larry Wall" : "Guido van Rossum",     "creator" : "Benevolent Dictator for Life",     "Perl" : "Python",   }     print multiple_replace(dict, text)

So in your case, you could make a dict trans = {"a": "aa", "b": "bb"} and then pass it into multiple_replace along with the text you want translated. Basically all that function is doing is creating one huge regex containing all of your regexes to translate, then when one is found, passing a lambda function to regex.sub to perform the translation dictionary lookup.

You could use this function while reading from your file, for example:

with open("notes.txt") as text:     new_text = multiple_replace(replacements, text.read()) with open("notes2.txt", "w") as result:     result.write(new_text)

I've actually used this exact method in production, in a case where I needed to translate the months of the year from Czech into English for a web scraping task.

As @nhahtdh pointed out, one downside to this approach is that it is not prefix-free: dictionary keys that are prefixes of other dictionary keys will cause the method to break.

answered Oct 01 '22 21:10

Emmett Butler

You can use capturing group and backreference:

re.sub(r"([characters])", r"\1\1", text.read())

Put characters that you want to double up in between []. For the case of lower case a, b, c:

re.sub(r"([abc])", r"\1\1", text.read())

In the replacement string, you can refer to whatever matched by a capturing group () with \n notation where n is some positive integer (0 excluded). \1 refers to the first capturing group. There is another notation \g<n> where n can be any non-negative integer (0 allowed); \g<0> will refer to the whole text matched by the expression.

If you want to double up all characters except new line:

re.sub(r"(.)", r"\1\1", text.read())

If you want to double up all characters (new line included):

re.sub(r"(.)", r"\1\1", text.read(), 0, re.S)

answered Oct 01 '22 19:10

nhahtdh

Related questions
                            
                                How to remove nan value while combining two column in Panda Data frame?
                            
                                Python's list comprehension vs .NET LINQ
                            
                                How to extract HTTP message body in BaseHTTPRequestHandler.do_POST()?
                            
                                Python: understanding class and instance variables
                            
                                Export a LaTeX table from pandas DataFrame
                            
                                Why does the "is" keyword have a different behavior when there is a dot in the string?
                            
                                How do I convert a single character into its hex ASCII value in Python?
                            
                                How to reload a module's function in Python?
                            
                                Read image grayscale opencv 3.0.0-dev
                            
                                Delete cell ipython 2.0
                            
                                Is it possible to use AngularJS with the Jinja2 template engine?
                            
                                Feature/Variable importance after a PCA analysis
                            
                                Is there a good Python library that can parse C++? [closed]
                            
                                Matplotlib - How to make the marker face color transparent without making the line transparent
                            
                                Convert float64 column to int64 in Pandas
                            
                                pip requirements.txt with alternative index
                            
                                How can I obtain the model's name or the content type of a Django object?
                            
                                Why does Python return 0 for simple division calculation?
                            
                                How to get the number of the most frequent value in a column?
                            
                                Docker image error: "/bin/sh: 1: [python,: not found"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I do multiple substitutions using regex?

Tags:

python

string

regex

Euridice01

People also ask

2 Answers

Emmett Butler

nhahtdh

Recent Activity

Donate For Us