What is the process of matching this regular expression? I don't get why the explicit group is 'c'. This is piece of code is taken from Python Re Module Doc.
>>> m = re.match("([abc])+", "abc")
>>> m.group()
'abc'
>>> m.groups()
('c',)
Also, what about:
>>> m = re.match("([abc]+)", "abc")
>>> m.group()
'abc'
>>> m.groups()
('abc',)
And:
>>> m = re.match("([abc])", "abc")
>>> m.group()
'a'
>>> m.groups()
('a',)
Thanks.
re.match("([abc])+", "abc")
Matches a group consisting of a, b or c. The group at the end of that is the last character found in the character class as matching is greedy so, ends up with the last matching character which is c.
m = re.match("([abc]+)", "abc")
Matches a group that contains one or more consecutive occurences of a, b or c. The matching group at the end is the largest contingious group of a, b or c.
re.match("([abc])", "abc")
Matches either a, b or c. The match group will always be the first matching character at the start of the string.
In your first example, ([abc])+ creates a group for each a, b, or c character it finds. c is the explicit group because it's the last character that the regex matches:
>>> re.match("([abc])+", "abca").groups()
('a',)
In your second example, you're creating one group that matches one or more a's, b's, or c's in a row. Thus, you create one group for abc. If we extend abc, the group will extend with the string:
>>> re.match("([abc]+)", "abca").groups()
('abca',)
In your third example, the regex is searching for exactly one character that is either an a, a b, or a c. Since a is the first character in abc, you get an a. This changes if we change the first character in the string:
>>> re.match("([abc])", "cba").group()
'c'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With