While working through Google's 2010 Python class, I found the following documentation:
'*'
-- 0 or more occurrences of the pattern to its left
But when I tried the following
re.search(r'i*','biiiiiiiiiiiiiig').group()
I expected 'iiiiiiiiiiiiii'
as output but got ''
. Why?
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.
*
means 0 or more but re.search
would return only the first match. Here the first match is an empty string. So you get an empty string as output.
Change *
to +
to get the desired output.
>>> re.search(r'i*','biiiiiiiiiiiiiig').group()
''
>>> re.search(r'i+','biiiiiiiiiiiiiig').group()
'iiiiiiiiiiiiii'
Consider this example.
>>> re.search(r'i*','biiiiiiiiiiiiiig').group()
''
>>> re.search(r'i*','iiiiiiiiiiiiiig').group()
'iiiiiiiiiiiiii'
Here i*
returns iiiiiiiiiiiiii
because at first , the regex engine tries to match zero or more times of i
. Once it finds i
at the very first, it matches greedily all the i
's like in the second example, so you get iiiiiiii
as output and if the i
is not at the first (consider this biiiiiiig
string), i*
pattern would match all the empty string before the every non-match, in our case it matches all the empty strings that exists before b
and g
. Because re.search
returns only the first match, you should get an empty string because of the non-match b
at the first.
Why i got three empty strings as output in the below example?
>>> re.findall(r'i*','biiiiiiiiiiiiiig')
['', 'iiiiiiiiiiiiii', '', '']
As i explained earlier, for every non-match you should get an empty string as match. Let me explain. Regex engine parses the input from left to right.
First empty string as output is because the pattern i*
won't match the character b
but it matches the empty string which exists before the b
.
Now the engine moves to the next character that is i
which would be matched by our pattern i*
, so it greedily matches the following i
's . So you get iiiiiiiiiiiiii
as the second.
After matching all the i
's, it moves to the next character that is g
which isn't matched by our pattern i*
. So i*
matches the empty string before the non-match g
. That's the reason for the third empty string.
Now our pattern i*
matches the empty string which exists before the end of the line. That's the reason for fourth empty string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With