I am learning about regex in Python and I have problems understanding the function <code>groups()</code>. <pre class="prettyprint"><code>>>> m = re.match("([abc])+", "abc") </code></pre> Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable: <pre class="prettyprint"><code>>>> m.group() 'abc' >>> m.group(0) 'abc' </code></pre> I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do: <pre class="prettyprint"><code>>>> m.group(1) 'c' >>> m.groups() ('c',) </code></pre> What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.

For <code>re</code> details consult docs. In your case: <code>group(0)</code> stands for all matched string, hence <code>abc</code>, that is 3 groups <code>a</code>, <code>b</code> and <code>c</code> <code>group(i)</code> stands for i'th group, and citing documentation <blockquote> If a group matches multiple times, only the last match is accessible </blockquote> hence <code>group(1)</code> stands for last match, <code>c</code> Your <code>+</code> is interpreted as group repetation, if you want repeat <code>[abc]</code> inside group, move <code>+</code> into parentheses: <pre class="prettyprint"><code>>>> re.match("([abc])", "abc").groups() ('a',) >>> re.match("([abc]+)", "abc").groups() ('abc',) </code></pre>

The groups() method in regular expressions in Python

Tags:

python

regex

I am learning about regex in Python and I have problems understanding the function groups().

>>> m = re.match("([abc])+", "abc")

Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:

>>> m.group()
'abc'
>>> m.group(0)
'abc'

I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:

>>> m.group(1)
'c'
>>> m.groups()
('c',)

What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.

276

asked Nov 25 '13 20:11

Omid

1 Answers

For re details consult docs. In your case:

group(0) stands for all matched string, hence abc, that is 3 groups a, b and c

group(i) stands for i'th group, and citing documentation

If a group matches multiple times, only the last match is accessible

hence group(1) stands for last match, c

Your + is interpreted as group repetation, if you want repeat [abc] inside group, move + into parentheses:

>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)

177

answered Oct 13 '22 01:10

alko

Related questions
                            
                                Efficient way to convert numpy record array to a list of dictionary
                            
                                Urlretrieve and User-Agent? - Python
                            
                                Python implementation of Jenkins Hash?
                            
                                How to see exception generated into django template variable?
                            
                                Optimising multiplication modulo a small prime
                            
                                Counterpart to PHP’s preg_match in Python
                            
                                creating a simple package that can be install via Pip & virtualenv
                            
                                Django dynamic urls
                            
                                How to change the 'tag' when logging to syslog from 'Unknown'?
                            
                                Is there any way to list queues in a rabbitmq via pika?
                            
                                'Request' object has no attribute 'get' Python error
                            
                                Robust algorithm for detection of peak widths
                            
                                Database errors in Django when using threading
                            
                                Avoiding repeat of code after loop?
                            
                                Saving a matplotlib/networkx figure without margins
                            
                                re.match vs re.search performance difference
                            
                                How do I assign a dictionary value to a variable in Python?
                            
                                What is needed to use gdb 7's support for debugging Python programs?
                            
                                python shuffle such that position will never repeat
                            
                                Drawing a graph with NetworkX on a Basemap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With