I understand that the pattern r'([a-z]+)\1+'
is searching for a repeated multi character pattern in the search string but I do not understand why in case k2
answer isn't 'aaaaa' (5 'a'):
import re
k1 = re.search(r'([a-z]+)\1+', 'aaaa')
k2 = re.search(r'([a-z]+)\1+', 'aaaaa')
k3 = re.search(r'([a-z]+)\1+', 'aaaaaa')
print(k1) # <_sre.SRE_Match object; span=(0, 4), match='aaaa'>
print(k2) # <_sre.SRE_Match object; span=(0, 4), match='aaaa'>
print(k3) # <_sre.SRE_Match object; span=(0, 6), match='aaaaaa'>
Python 3.6.1
Regular Expressions (a.k.a regex) are a set of pattern matching commands used to detect string sequences in a large text data. These commands are designed to match a family (alphanumeric, digits, words) of text which makes then versatile enough to handle any text / string class.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
Because + is greedy.
What happens is ([a-z]+)
first matches 'aaaaa', then it backtracks until \1+
matches the string, and stops. Because 'aa' is the first value of the ([a-z]+)
that will let \1
successfully match, that's what it returns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With