I have to validate next string format:
text-text-id-text
Separator is character '-'. Third column must always be id. I wrote next regex (in python) which validates string:
import re
s = 'col1-col2-col3-id' # any additional text at the end
# is allowed e.g. -col4-col5
print re.match('^(.*-){3}id(-.*)?$', s) # ok
print re.match('^(.*-){1}id(-.*)?$', s) # still ok, is should not be
I tried adding non-greedy mode, but result is still the same:
^(.*?-){1}id(-.*)?$
What am I missing in my regex? I could just validate string like this:
>>> import re
>>> print re.split('-', 'col1-col2-col3-id')
['col1', 'col2', 'col3', 'id']
And then check if the third element matches id, but I am interested in why does the first regex works as mentioned above.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
The { n ,} quantifier matches the preceding element at least n times, where n is any integer. { n ,} is a greedy quantifier whose lazy equivalent is { n ,}? .
Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.
Your first regex is incorrect because it asserts that id
is present after the first three items.
Your second regex matches the string incorrectly because .*
matches hyphens as well.
You should use this regex:
/^(?:[^-]+-){2}id/
Here is a regex demo!
And if you feel a need to anchor a regex to the end, use /^(?:[^-]*-){2}id.*$/
!
As mentioned by Tim Pietzcker, consider asserting id
at the end of the item:
/^(?:[^-]+-){2}id(?![^-])/
Here is an UPDATED regex demo!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With