Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions - different string the same match

Tags:

python

regex

I understand that the pattern r'([a-z]+)\1+' is searching for a repeated multi character pattern in the search string but I do not understand why in case k2 answer isn't 'aaaaa' (5 'a'):

import re
k1 = re.search(r'([a-z]+)\1+', 'aaaa')
k2 = re.search(r'([a-z]+)\1+', 'aaaaa')
k3 = re.search(r'([a-z]+)\1+', 'aaaaaa')
print(k1)  # <_sre.SRE_Match object; span=(0, 4), match='aaaa'>
print(k2)  # <_sre.SRE_Match object; span=(0, 4), match='aaaa'>
print(k3)  # <_sre.SRE_Match object; span=(0, 6), match='aaaaaa'>

Python 3.6.1

like image 741
Maxim Andreev Avatar asked Feb 06 '18 16:02

Maxim Andreev


People also ask

Is regex used for different string operations?

Regular Expressions (a.k.a regex) are a set of pattern matching commands used to detect string sequences in a large text data. These commands are designed to match a family (alphanumeric, digits, words) of text which makes then versatile enough to handle any text / string class.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does regex (? S match?

Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.


1 Answers

Because + is greedy.

What happens is ([a-z]+) first matches 'aaaaa', then it backtracks until \1+ matches the string, and stops. Because 'aa' is the first value of the ([a-z]+) that will let \1 successfully match, that's what it returns.

like image 98
C_Elegans Avatar answered Sep 21 '22 20:09

C_Elegans