Using re.findall()
I've managed to get return multiple matches of a regex in a string. However my object returned is a list of matches within the string. This is not what I want.
What I want is to replace all matches with something else. I've tried to use similar syntax as you would use in re.sub to do this as so:
import json
import re
regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"
f = open(filepath, 'r')
myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
print myfile
However, this creates the following error:
Traceback (most recent call last):
File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module>
myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
File "C:\Python27\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
File "C:\Python27\lib\re.py", line 229, in _compile
bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Can anyone assist me within the last bit of syntax I need to replace all matches with something else within the original Python object?
EDIT:
In line with comments and answers received, here is me trying to sub one regex with another:
import json
import re
regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S)
filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"
f = open(filepath, 'r')
myfile = f.read()
myfile2 = re.sub(regex, regex2, myfile)
print myfile
This now produces the following error:
Traceback (most recent call last):
File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module>
myfile2 = re.sub(regex, regex2, myfile)
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Python27\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
File "C:\Python27\lib\re.py", line 258, in _compile_repl
p = sre_parse.parse_template(repl, pattern)
File "C:\Python27\lib\sre_parse.py", line 706, in parse_template
s = Tokenizer(source)
File "C:\Python27\lib\sre_parse.py", line 181, in __init__
self.__next()
File "C:\Python27\lib\sre_parse.py", line 183, in __next
if self.index >= len(self.string):
TypeError: object of type '_sre.SRE_Pattern' has no len()
findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
By default, the count is set to zero, which means the re. sub() method will replace all pattern occurrences in the target string.
Here you can see that, search() method is able to find a pattern from any position of the string. The re. findall() helps to get a list of all matching patterns. It searches from start or end of the given string.
findall(): Finding all matches in a string/list. Regex's findall() function is extremely useful as it returns a list of strings containing all matches. If the pattern is not found, re. findall() returns an empty list.
import re
regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
myfile = 'foo"s bar'
myfile2 = regex.sub(lambda m: m.group().replace('"',"%",1), myfile)
print(myfile2)
If I understand your question correctly, you're trying to replace a quotation mark between two characters with an percent sign between those characters.
There are several ways to do this with re.sub
(re.findall
doesn't do replacements at all, so your initial attemps were always doomed to fail).
An easy approach would be to change your pattern to group the letters separately, and then use a replacement string that includes backreferences:
pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S)
re.sub(pattern, r'\1%\2', text)
Another option would be to use a replacement function instead of a replacement string. The function will be called with a match
object for each match found in the text, and its return value is the replacement:
pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S)
re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text)
(There are probably lots of other ways of implementing the lambda function. I like string formatting.)
However, probably the best approach is to use a lookahead and a lookbehind in your pattern to make sure your quotation mark is between letters without actually matching those letters. This lets you use the trivial string '%'
as the replacement:
pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S)
re.sub(pattern, '%', text)
This does have very slightly different semantics than the other versions. A text like 'a"b"c'
will have both quotation marks replaced, while the previous codes would only replace the first one. Hopefully this is an improvement!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With