Situation:
I want to both delete the R-matches from the text, and see what they actually contain. Currently, I do this like:
import re
ab_re = re.compile("[ab]")
text="abcdedfe falijbijie bbbb laifsjelifjl"
ab_re.findall(text)
# ['a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'a']
ab_re.sub('',text)
# 'cdedfe flijijie lifsjelifjl'
This runs the regex twice, near as I can tell. Is there a technique to do it all on pass, perhaps using re.split? It seems like with split based solutions I'd need to do the regex at least twice as well.
import re
r = re.compile("[ab]")
text = "abcdedfe falijbijie bbbb laifsjelifjl"
matches = []
replaced = []
pos = 0
for m in r.finditer(text):
matches.append(m.group(0))
replaced.append(text[pos:m.start()])
pos = m.end()
replaced.append(text[pos:])
print matches
print ''.join(replaced)
Outputs:
['a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'a']
cdedfe flijijie lifsjelifjl
What about this:
import re
text = "abcdedfe falijbijie bbbb laifsjelifjl"
matches = []
ab_re = re.compile( "[ab]" )
def verboseTest( m ):
matches.append( m.group(0) )
return ''
textWithoutMatches = ab_re.sub( verboseTest, text )
print matches
# ['a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'a']
print textWithoutMatches
# cdedfe flijijie lifsjelifjl
The 'repl' argument of the re.sub function can be a function so you can report or save the matches from there and whatever the function returns is what 'sub' will substitute.
The function could easily be modified to do a lot more too! Check out the re module documentation on docs.python.org for more information on what else is possible.
My revised answer, using re.split(), which does things in one regex pass:
import re
text="abcdedfe falijbijie bbbb laifsjelifjl"
ab_re = re.compile("([ab])")
tokens = ab_re.split(text)
non_matches = tokens[0::2]
matches = tokens[1::2]
(edit: here is a complete function version)
def split_matches(text,compiled_re):
''' given a compiled re, split a text
into matching and nonmatching sections
returns m, n_m, two lists
'''
tokens = compiled_re.split(text)
matches = tokens[1::2]
non_matches = tokens[0::2]
return matches,non_matches
m,nm = split_matches(text,ab_re)
''.join(nm) # equivalent to ab_re.sub('',text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With