Using <code>re.findall()</code> I've managed to get return multiple matches of a regex in a string. However my object returned is a list of matches within the string. This is not what I want. What I want is to replace all matches with something else. I've tried to use similar syntax as you would use in re.sub to do this as so: <pre class="prettyprint"><code>import json import re regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S) filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt" f = open(filepath, 'r') myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read()) print myfile </code></pre> However, this creates the following error: <pre class="prettyprint"><code>Traceback (most recent call last): File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module> myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read()) File "C:\Python27\lib\re.py", line 177, in findall return _compile(pattern, flags).findall(string) File "C:\Python27\lib\re.py", line 229, in _compile bypass_cache = flags & DEBUG TypeError: unsupported operand type(s) for &: 'str' and 'int' </code></pre> Can anyone assist me within the last bit of syntax I need to replace all matches with something else within the original Python object? EDIT: In line with comments and answers received, here is me trying to sub one regex with another: <pre class="prettyprint"><code>import json import re regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S) regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S) filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt" f = open(filepath, 'r') myfile = f.read() myfile2 = re.sub(regex, regex2, myfile) print myfile </code></pre> This now produces the following error: <pre class="prettyprint"><code>Traceback (most recent call last): File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module> myfile2 = re.sub(regex, regex2, myfile) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Python27\lib\re.py", line 273, in _subx template = _compile_repl(template, pattern) File "C:\Python27\lib\re.py", line 258, in _compile_repl p = sre_parse.parse_template(repl, pattern) File "C:\Python27\lib\sre_parse.py", line 706, in parse_template s = Tokenizer(source) File "C:\Python27\lib\sre_parse.py", line 181, in __init__ self.__next() File "C:\Python27\lib\sre_parse.py", line 183, in __next if self.index >= len(self.string): TypeError: object of type '_sre.SRE_Pattern' has no len() </code></pre>

If I understand your question correctly, you're trying to replace a quotation mark between two characters with an percent sign between those characters. There are several ways to do this with <code>re.sub</code> (<code>re.findall</code> doesn't do replacements at all, so your initial attemps were always doomed to fail). An easy approach would be to change your pattern to group the letters separately, and then use a replacement string that includes backreferences: <pre class="prettyprint"><code>pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S) re.sub(pattern, r'\1%\2', text) </code></pre> Another option would be to use a replacement function instead of a replacement string. The function will be called with a <code>match</code> object for each match found in the text, and its return value is the replacement: <pre class="prettyprint"><code>pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S) re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text) </code></pre> (There are probably lots of other ways of implementing the lambda function. I like string formatting.) However, probably the best approach is to use a lookahead and a lookbehind in your pattern to make sure your quotation mark is between letters without actually matching those letters. This lets you use the trivial string <code>'%'</code> as the replacement: <pre class="prettyprint"><code>pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S) re.sub(pattern, '%', text) </code></pre> This does have very slightly different semantics than the other versions. A text like <code>'a"b"c'</code> will have both quotation marks replaced, while the previous codes would only replace the first one. Hopefully this is an improvement!

Replace all matches using re.findall()

Tags:

python

regex

Using re.findall() I've managed to get return multiple matches of a regex in a string. However my object returned is a list of matches within the string. This is not what I want.

What I want is to replace all matches with something else. I've tried to use similar syntax as you would use in re.sub to do this as so:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
print myfile

However, this creates the following error:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module>
    myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
  File "C:\Python27\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Python27\lib\re.py", line 229, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

Can anyone assist me within the last bit of syntax I need to replace all matches with something else within the original Python object?

EDIT:

In line with comments and answers received, here is me trying to sub one regex with another:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = f.read()
myfile2 = re.sub(regex, regex2, myfile)
print myfile

This now produces the following error:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module>
    myfile2 = re.sub(regex, regex2, myfile)
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 273, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python27\lib\re.py", line 258, in _compile_repl
    p = sre_parse.parse_template(repl, pattern)
  File "C:\Python27\lib\sre_parse.py", line 706, in parse_template
    s = Tokenizer(source)
  File "C:\Python27\lib\sre_parse.py", line 181, in __init__
    self.__next()
  File "C:\Python27\lib\sre_parse.py", line 183, in __next
    if self.index >= len(self.string):
TypeError: object of type '_sre.SRE_Pattern' has no len()

536

asked Sep 19 '15 16:09

gdogg371

2 Answers

import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
myfile =  'foo"s bar'
myfile2 = regex.sub(lambda m: m.group().replace('"',"%",1), myfile)
print(myfile2)

answered Sep 23 '22 18:09

Padraic Cunningham

If I understand your question correctly, you're trying to replace a quotation mark between two characters with an percent sign between those characters.

There are several ways to do this with re.sub (re.findall doesn't do replacements at all, so your initial attemps were always doomed to fail).

An easy approach would be to change your pattern to group the letters separately, and then use a replacement string that includes backreferences:

pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S)
re.sub(pattern, r'\1%\2', text)

Another option would be to use a replacement function instead of a replacement string. The function will be called with a match object for each match found in the text, and its return value is the replacement:

pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S)
re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text)

(There are probably lots of other ways of implementing the lambda function. I like string formatting.)

However, probably the best approach is to use a lookahead and a lookbehind in your pattern to make sure your quotation mark is between letters without actually matching those letters. This lets you use the trivial string '%' as the replacement:

pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S)
re.sub(pattern, '%', text)

This does have very slightly different semantics than the other versions. A text like 'a"b"c' will have both quotation marks replaced, while the previous codes would only replace the first one. Hopefully this is an improvement!

answered Sep 22 '22 18:09

Blckknght

Related questions
                            
                                Simple way to group items into buckets
                            
                                Parallel optimizations in SciPy
                            
                                Networkx: Differences between pagerank, pagerank_numpy, and pagerank_scipy?
                            
                                Divide one list by another list
                            
                                Format Python Decimal object to a specified precision
                            
                                For Pylint, is it possible to have a different pylintrc file for each Eclipse project?
                            
                                Swapping Axes in Pandas
                            
                                Python: How to not print comma in last element in a for loop?
                            
                                global variable inside main function python
                            
                                Python: NameError: free variable 're' referenced before assignment in enclosing scope
                            
                                Selenium / Python - Selecting via css selector
                            
                                Empty list returned from ElementTree findall
                            
                                Redirect print to string list?
                            
                                How to change my django server time
                            
                                Integration of python in C# Application
                            
                                Python built-in sum function vs. for loop performance
                            
                                PyQt5: Keyboard shortcuts w/ QAction
                            
                                How to label and change the scale of Seaborn kdeplot's axes
                            
                                speech recognition python code not working
                            
                                Python HTML Encoding \xc2\xa0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With