Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace all matches using re.findall()

Tags:

python

regex

Using re.findall() I've managed to get return multiple matches of a regex in a string. However my object returned is a list of matches within the string. This is not what I want.

What I want is to replace all matches with something else. I've tried to use similar syntax as you would use in re.sub to do this as so:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
print myfile

However, this creates the following error:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module>
    myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
  File "C:\Python27\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Python27\lib\re.py", line 229, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

Can anyone assist me within the last bit of syntax I need to replace all matches with something else within the original Python object?

EDIT:

In line with comments and answers received, here is me trying to sub one regex with another:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = f.read()
myfile2 = re.sub(regex, regex2, myfile)
print myfile

This now produces the following error:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module>
    myfile2 = re.sub(regex, regex2, myfile)
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 273, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python27\lib\re.py", line 258, in _compile_repl
    p = sre_parse.parse_template(repl, pattern)
  File "C:\Python27\lib\sre_parse.py", line 706, in parse_template
    s = Tokenizer(source)
  File "C:\Python27\lib\sre_parse.py", line 181, in __init__
    self.__next()
  File "C:\Python27\lib\sre_parse.py", line 183, in __next
    if self.index >= len(self.string):
TypeError: object of type '_sre.SRE_Pattern' has no len()
like image 536
gdogg371 Avatar asked Sep 19 '15 16:09

gdogg371


People also ask

What is re Findall () in Python?

findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.

Does re sub replace all occurrences?

By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.

What is difference between Search () and Findall () methods in Python?

Here you can see that, search() method is able to find a pattern from any position of the string. The re. findall() helps to get a list of all matching patterns. It searches from start or end of the given string.

What does regex Findall return?

findall(): Finding all matches in a string/list. Regex's findall() function is extremely useful as it returns a list of strings containing all matches. If the pattern is not found, re. findall() returns an empty list.


2 Answers

import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
myfile =  'foo"s bar'
myfile2 = regex.sub(lambda m: m.group().replace('"',"%",1), myfile)
print(myfile2)
like image 98
Padraic Cunningham Avatar answered Sep 23 '22 18:09

Padraic Cunningham


If I understand your question correctly, you're trying to replace a quotation mark between two characters with an percent sign between those characters.

There are several ways to do this with re.sub (re.findall doesn't do replacements at all, so your initial attemps were always doomed to fail).

An easy approach would be to change your pattern to group the letters separately, and then use a replacement string that includes backreferences:

pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S)
re.sub(pattern, r'\1%\2', text)

Another option would be to use a replacement function instead of a replacement string. The function will be called with a match object for each match found in the text, and its return value is the replacement:

pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S)
re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text)

(There are probably lots of other ways of implementing the lambda function. I like string formatting.)

However, probably the best approach is to use a lookahead and a lookbehind in your pattern to make sure your quotation mark is between letters without actually matching those letters. This lets you use the trivial string '%' as the replacement:

pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S)
re.sub(pattern, '%', text)

This does have very slightly different semantics than the other versions. A text like 'a"b"c' will have both quotation marks replaced, while the previous codes would only replace the first one. Hopefully this is an improvement!

like image 32
Blckknght Avatar answered Sep 22 '22 18:09

Blckknght