In the snippet below, the non-capturing group <code>"(?:aaa)"</code> should be ignored in the matching result, The result should be <code>"_bbb"</code> only. However, I get <code>"aaa_bbb"</code> in the matching result; only when I specify group(2) does it show <code>"_bbb"</code>. <pre class="prettyprint"><code>>>> import re >>> s = "aaa_bbb" >>> print(re.match(r"(?:aaa)(_bbb)", s).group()) aaa_bbb </code></pre>

<code>group()</code> and <code>group(0)</code> will return the entire match. Subsequent groups are actual capture groups. <pre class="prettyprint"><code>>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0)) aaa_bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1)) _bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2)) Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: no such group </code></pre> If you want the same behavior than <code>group()</code>: <code>" ".join(re.match(r"(?:aaa)(_bbb)", string1).groups())</code>

Why isn't the regular expression's "non-capturing" group working?

Tags:

python

regex

In the snippet below, the non-capturing group "(?:aaa)" should be ignored in the matching result,

The result should be "_bbb" only.

However, I get "aaa_bbb" in the matching result; only when I specify group(2) does it show "_bbb".

>>> import re >>> s = "aaa_bbb" >>> print(re.match(r"(?:aaa)(_bbb)", s).group())  aaa_bbb

414

asked Apr 24 '10 02:04

Jim Horng

2 Answers

I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.

Both the regex (?:aaa)(_bbb) and the regex (aaa)(_bbb) return aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns _bbb as its match, while the second regex has two capturing groups that return aaa and _bbb as their respective matches. In your Python code, to get _bbb, you'd need to use group(1) with the first regex, and group(2) with the second regex.

The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.

If you really want to exclude aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?<=aaa)_bbb. With this regex, group() returns _bbb in Python. No capturing groups needed.

My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.

160

answered Sep 17 '22 20:09

Jan Goyvaerts

group() and group(0) will return the entire match. Subsequent groups are actual capture groups.

>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0)) aaa_bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1)) _bbb >>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2)) Traceback (most recent call last):   File "<stdin>", line 1, in ? IndexError: no such group

If you want the same behavior than group():

" ".join(re.match(r"(?:aaa)(_bbb)", string1).groups())

answered Sep 20 '22 20:09

Richard Simões

Related questions
                            
                                How do I tokenize a string sentence in NLTK?
                            
                                Python modbus library
                            
                                Can I change the connection pool size for Python's "requests" module?
                            
                                How can I tell if a python variable is a string or a list?
                            
                                Disable boto logging without modifying the boto files
                            
                                Celery AttributeError: async error
                            
                                How to dynamically select template directory to be used in flask?
                            
                                Enter Interactive Mode In Python
                            
                                add title to collection of pandas hist plots
                            
                                A quick way to return list without a specific element in Python
                            
                                How do I read a date in Excel format in Python?
                            
                                django-admin.py makemessages not working
                            
                                Split string into strings by length?
                            
                                What does numpy.gradient do?
                            
                                How to set Python language specific tab spacing in Visual Studio Code?
                            
                                How can I use Python for large scale development?
                            
                                'virtualenv' is not recognized as an internal or external command, operable program or batch file
                            
                                How can I produce a human readable difference when subtracting two UNIX timestamps using Python?
                            
                                How do I get the file / key size in boto S3?
                            
                                How can I remove extra whitespace from strings when parsing a csv file in Pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With