Could someone explain to me the difference between these 3 blocks:
1 -> (.*)
2 -> (.*?)
3 -> .*
As I understand, ?
makes the last character optional so why put it ?
And why not put the parenthesis at the end?
This comes from here: http://www.tutorialspoint.com/python/python_reg_expressions.htm
1st example : searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
.*
will match any character (including newlines if dotall is used). This is greedy: it matches as much as it can.
(.*)
will add that to a capture group.
(.*?)
the ?
makes the .*
non-greedy, matching as little as it can to make a match, and the parenthesis makes it a capture group as well.
For example:
>>> import re
>>> txt = ''' foo
... bar
... baz '''
>>> for found in re.finditer('(.*)', txt):
... print found.groups()
...
(' foo',)
('',)
('bar',)
('',)
('baz ',)
('',)
>>> for found in re.finditer('.*', txt):
... print found.groups()
...
()
()
()
()
()
()
>>> for found in re.finditer('.*', txt, re.DOTALL):
... print found.groups()
...
()
()
>>> for found in re.finditer('(.*)', txt, re.DOTALL):
... print found.groups()
...
(' foo\nbar\nbaz ',)
('',)
And since the ?
matches as little as possible, we match empty strings:
>>> for found in re.finditer('(.*?)', txt, re.DOTALL):
... print found.groups()
...
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
('',)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With