I am learning about regex in Python and I have problems understanding the function groups()
.
>>> m = re.match("([abc])+", "abc")
Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:
>>> m.group()
'abc'
>>> m.group(0)
'abc'
I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:
>>> m.group(1)
'c'
>>> m.groups()
('c',)
What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
In this case, match() will return a match object, so you should store the result in a variable for later use. group() returns the substring that was matched by the RE. start() and end() return the starting and ending index of the match.
For re
details consult docs. In your case:
group(0)
stands for all matched string, hence abc
, that is 3 groups a
, b
and c
group(i)
stands for i'th group, and citing documentation
If a group matches multiple times, only the last match is accessible
hence group(1)
stands for last match, c
Your +
is interpreted as group repetation, if you want repeat [abc]
inside group, move +
into parentheses:
>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With