In [29]: re.findall("([abc])+","abc")
Out[29]: ['c']
In [30]: re.findall("[abc]+","abc")
Out[30]: ['abc']
Confused by the grouped one. How does it make difference?
The “*” allows any number of repeated characters (or sequences). Example: “ABC*” will accept any string with “AB” and any number of Cs (0 to unlimited). The “+” is similar to “*”, but requires at least one character (or sequence). Example: “ABC+” will accept any string with “ABC” and any number of Cs after that.
Pattern matching is used by the shell commands such as the ls command, whereas regular expressions are used to search for strings of text in a file by using commands, such as the grep command. Lists all the files in the directory.
[abc] means "a or b or c", e.g. query "[br]ang" will match both "adbarnirrang" and "bang" [^abc] means "begins with any character but a,b,c", e.g. query [^aeou]ang will match "rang" but not "baang"
There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression.
There are two things that need to be explained here: the behavior of quantified groups, and the design of the findall()
method.
In your first example, [abc]
matches the a
, which is captured in group #1. Then it matches b
and captures it in group #1, overwriting the a
. Then again with the c
, and that's what's left in group #1 at the end of the match.
But it does match the whole string. If you were using search()
or finditer()
, you would be able to look at the MatchObject and see that group(0)
contains abc
and group(1)
contains c
. But findall()
returns strings, not MatchObjects. If there are no groups, it returns a list of the overall matches; if there are groups, the list contains all the captures, but not the overall match.
So both of your regexes are matching the whole string, but the first one is also capturing and discarding each character individually (which is kinda pointless). It's only the unexpected behavior of findall()
that makes it look like you're getting different results.
In the first example you have a repeated captured group which only capture the last iteration. Here c
.
([abc])+
Debuggex Demo
In the second example you are matching a single character in the list one and unlimited times.
[abc]+
Debuggex Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With