Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

difference between two regular expressions: [abc]+ and ([abc])+

Tags:

python

regex

In [29]: re.findall("([abc])+","abc")
Out[29]: ['c']

In [30]: re.findall("[abc]+","abc")
Out[30]: ['abc']

Confused by the grouped one. How does it make difference?

like image 435
user3015347 Avatar asked Feb 28 '16 02:02

user3015347


People also ask

What is the difference between the regular expressions ABC * and ABC +?

The “*” allows any number of repeated characters (or sequences). Example: “ABC*” will accept any string with “AB” and any number of Cs (0 to unlimited). The “+” is similar to “*”, but requires at least one character (or sequence). Example: “ABC+” will accept any string with “ABC” and any number of Cs after that.

What is the difference between pattern and regular expression?

Pattern matching is used by the shell commands such as the ls command, whereas regular expressions are used to search for strings of text in a file by using commands, such as the grep command. Lists all the files in the directory.

What does the regular expression ABC mean?

[abc] means "a or b or c", e.g. query "[br]ang" will match both "adbarnirrang" and "bang" [^abc] means "begins with any character but a,b,c", e.g. query [^aeou]ang will match "rang" but not "baang"

What are different types of regular expression?

There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression.


2 Answers

There are two things that need to be explained here: the behavior of quantified groups, and the design of the findall() method.

In your first example, [abc] matches the a, which is captured in group #1. Then it matches b and captures it in group #1, overwriting the a. Then again with the c, and that's what's left in group #1 at the end of the match.

But it does match the whole string. If you were using search() or finditer(), you would be able to look at the MatchObject and see that group(0) contains abc and group(1) contains c. But findall() returns strings, not MatchObjects. If there are no groups, it returns a list of the overall matches; if there are groups, the list contains all the captures, but not the overall match.

So both of your regexes are matching the whole string, but the first one is also capturing and discarding each character individually (which is kinda pointless). It's only the unexpected behavior of findall() that makes it look like you're getting different results.

like image 172
Alan Moore Avatar answered Oct 21 '22 16:10

Alan Moore


In the first example you have a repeated captured group which only capture the last iteration. Here c.

([abc])+

Regular expression visualization

Debuggex Demo

In the second example you are matching a single character in the list one and unlimited times.

[abc]+

Regular expression visualization

Debuggex Demo

like image 7
styvane Avatar answered Oct 21 '22 16:10

styvane