Quick regular expression question.
I'm trying to capture multiple instances of a capture group in python (don't think it's python specific), but the subsequent captures seems to overwrite the previous.
In this over-simplified example, I'm essentially trying to split a string:
x = 'abcdef' r = re.compile('(\w){6}') m = r.match(x) m.groups() # = ('f',) ?!?I want to get
('a', 'b', 'c', 'd', 'e', 'f')
, but because regex overwrites subsequent captures, I get ('f',)
Is this how regex is supposed to behave? Is there a way to do what I want without having to repeat the syntax six times?
Thanks in advance!
Andrew
You can't use groups for this, I'm afraid. Each group can match only once, I believe all regexes work this way. A possible solution is to try to use findall() or similar.
r=re.compile(r'\w')
r.findall(x)
# 'a', 'b', 'c', 'd', 'e', 'f'
The regex module can do this.
> m = regex.match('(\w){6}', "abcdef")
> m.captures(1)
['a', 'b', 'c', 'd', 'e', 'f']
Also works with named captures:
> m = regex.match('(?P<letter>)\w)', "abcdef")
> m.capturesdict()
{'letter': ['a', 'b', 'c', 'd', 'e', 'f']}
The regex module is expected to replace the 're' module - it is a drop-in replacement that acts identically, except it has many more features and capabilities.
To find all matches in a given string use re.findall(regex, string). Also, if you want to obtain every letter here, your regex should be either '(\w){1}'
or just '(\w)'
.
See:
r = re.compile('(\w)')
l = re.findall(r, x)
l == ['a', 'b', 'c', 'd', 'e', 'f']
I suppose your question is a simplified presentation of your need.
Then, I take an exemple a little more complex:
import re
pat = re.compile('[UI][bd][ae]')
ch = 'UbaUdeIbaIbeIdaIdeUdeUdaUdeUbeIda'
print [mat.group() for mat in pat.finditer(ch)]
result
['Uba', 'Ude', 'Iba', 'Ibe', 'Ida', 'Ide', 'Ude', 'Uda', 'Ude', 'Ube', 'Ida']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With