I'm working on a little Python script that is supposed to match a series of authors and I'm using the re
-module for that. I came across something unexpected and I have been able to reduce it to the following very simple example:
>>> import re
>>> s = "$word1$, $word2$, $word3$, $word4$"
>>> word = r'\$(word\d)\$'
>>> m = re.match(word+'(?:, ' + word + r')*', s)
>>> m.groups()
('word1', 'word4')
So I'm defining a 'basic' regexp that matches the main parts of my input, with some recognizable features (in this case I used the $
-signs) and than I try to match one word plus a possible additional list of words.
I'd have expected that m.groups()
would've displayed:
>>> m.groups()
('word1', 'word2', 'word3', 'word4')
But apparently I'm doing something wrong. I'd like to know why this solution does not work and how to change it, such that I get the result I'm looking for. BTW, this is with Python 2.6.6 on a Linux machine, in case that matters.
Although you're re is matching every $word#$
, the second capture group is continuously getting replaced by the last item matched.
Let's take a look at the debugger:
>>> expr = r"\$(word\d)\$(?:, \$(word\d)\$)*"
>>> c = re.compile(expr, re.DEBUG)
literal 36
subpattern 1
literal 119
literal 111
literal 114
literal 100
in
category category_digit
literal 36
max_repeat 0 65535
subpattern None
literal 44
literal 32
literal 36
subpattern 2
literal 119
literal 111
literal 114
literal 100
in
category category_digit
literal 36
As you can see, there are only 2 capture groups: subpattern 1
and subpattern 2
. Every time another $word#$
is found, subpattern 2
gets overwritten.
As for a potential solution, I would recommend using re.findall()
instead of re.match()
:
>>> s = "$word1$, $word2$, $word3$, $word4$"
>>> authors = re.findall(r"\$(\w+)\$", s)
>>> authors
['word1', 'word2', 'word3', 'word4']
There are only two capture groups in your regexp. Try re.findall(word, s)
instead.
Repeated captures are supported by regex
module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With