I have below code:
import re
line = "78349999234";
searchObj = re.search(r'9*', line)
if searchObj:
print "searchObj.group() : ", searchObj.group()
else:
print "Nothing found!!"
However the output is empty. I thought *
means: Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab*
will match ‘a’
, ‘ab’
, or ‘a’
followed by any number of ‘b’
s. Why am I not able to see any result in this case?
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.
I think the regular expression matches left to right. So the first pattern that matches is the empty string before 7...
. If it find a 9
, it will indeed match it greedy: and try to "eat" (that's the correct terminology) as many characters as possible.
If you query for:
>>> print(re.findall(r'9*',line));
['', '', '', '', '9999', '', '', '', '']
It matches all empty strings between the characters and as you can see, 9999
is matched as well.
The main reason is probably performance: if you search for a pattern in a string of 10M+ characters, you're very happy if the pattern is already in the first 10k characters. You don't want to waste effort on finding the "nicest" match...
EDIT
With 0 or more occurrence one means the group (in this case 9
) is repeated zero or more times. In an empty string, the characters is repeated exactly 0 times. If you want to match patterns where the characters is repeated one or more times, you should use
9+
This results in:
>>> print(re.search(r'9+', line));
<_sre.SRE_Match object; span=(4, 8), match='9999'>
re.search
for a pattern that accepts the empty string, is probably not that much helpful since it will always match the empty string before the actual start of the string first.
The main reason is , re.search
function stops searching for strings once it finds a match. 9*
means match the digit 9 zero or more times. Because an empty string exists before each and every character, re.search
function stops it searching after finding the first empty string. That's why you got an empty string as output...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With