Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The groups() method in regular expressions in Python

Tags:

python

regex

I am learning about regex in Python and I have problems understanding the function groups().

>>> m = re.match("([abc])+", "abc")

Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:

>>> m.group()
'abc'
>>> m.group(0)
'abc'

I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:

>>> m.group(1)
'c'
>>> m.groups()
('c',)

What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.

like image 276
Omid Avatar asked Nov 25 '13 20:11

Omid


People also ask

WHAT IS group in Python RegEx?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

What is the use of groups in RegEx?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What does match group do in Python?

In this case, match() will return a match object, so you should store the result in a variable for later use. group() returns the substring that was matched by the RE. start() and end() return the starting and ending index of the match.


1 Answers

For re details consult docs. In your case:

group(0) stands for all matched string, hence abc, that is 3 groups a, b and c

group(i) stands for i'th group, and citing documentation

If a group matches multiple times, only the last match is accessible

hence group(1) stands for last match, c

Your + is interpreted as group repetation, if you want repeat [abc] inside group, move + into parentheses:

>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)
like image 177
alko Avatar answered Oct 13 '22 01:10

alko