I have to validate next string format: <pre class="prettyprint"><code>text-text-id-text </code></pre> Separator is character '-'. Third column must always be id. I wrote next regex (in python) which validates string: <pre class="prettyprint"><code>import re s = 'col1-col2-col3-id' # any additional text at the end # is allowed e.g. -col4-col5 print re.match('^(.*-){3}id(-.*)?$', s) # ok print re.match('^(.*-){1}id(-.*)?$', s) # still ok, is should not be </code></pre> I tried adding non-greedy mode, but result is still the same: <pre class="prettyprint"><code>^(.*?-){1}id(-.*)?$ </code></pre> What am I missing in my regex? I could just validate string like this: <pre class="prettyprint"><code>>>> import re >>> print re.split('-', 'col1-col2-col3-id') ['col1', 'col2', 'col3', 'id'] </code></pre> And then check if the third element matches id, but I am interested in why does the first regex works as mentioned above.

Your first regex is incorrect because it asserts that <code>id</code> is present after the first three items. Your second regex matches the string incorrectly because <code>.*</code> matches hyphens as well. You should use this regex: <pre class="prettyprint"><code>/^(?:[^-]+-){2}id/ </code></pre> Here is a regex demo! And if you feel a need to anchor a regex to the end, use <code>/^(?:[^-]*-){2}id.*$/</code>! <hr> As mentioned by Tim Pietzcker, consider asserting <code>id</code> at the end of the item: <pre class="prettyprint"><code>/^(?:[^-]+-){2}id(?![^-])/ </code></pre> Here is an UPDATED regex demo!

Regex group match exactly n times

Tags:

python

string

regex

I have to validate next string format:

text-text-id-text

Separator is character '-'. Third column must always be id. I wrote next regex (in python) which validates string:

import re

s = 'col1-col2-col3-id' # any additional text at the end
                        # is allowed e.g. -col4-col5
print re.match('^(.*-){3}id(-.*)?$', s) # ok 
print re.match('^(.*-){1}id(-.*)?$', s) # still ok, is should not be

I tried adding non-greedy mode, but result is still the same:

^(.*?-){1}id(-.*)?$

What am I missing in my regex? I could just validate string like this:

>>> import re
>>> print re.split('-', 'col1-col2-col3-id')
['col1', 'col2', 'col3', 'id']

And then check if the third element matches id, but I am interested in why does the first regex works as mentioned above.

478

asked Aug 15 '14 11:08

broadband

1 Answers

Your first regex is incorrect because it asserts that id is present after the first three items.
Your second regex matches the string incorrectly because .* matches hyphens as well.

You should use this regex:

/^(?:[^-]+-){2}id/

Here is a regex demo!

And if you feel a need to anchor a regex to the end, use /^(?:[^-]*-){2}id.*$/!

As mentioned by Tim Pietzcker, consider asserting id at the end of the item:

/^(?:[^-]+-){2}id(?![^-])/

Here is an UPDATED regex demo!

108

answered Oct 06 '22 21:10

Unihedron

Related questions
                            
                                Pytest 2.5.2 coverage reports missing lines which must have been processed
                            
                                pendant to inline formsets for many-to-many relations
                            
                                read README in setup.py
                            
                                How does Python 2.7.3 hash strings used to seed random number generators?
                            
                                Most Pythonic way to read CSV values into dict of lists
                            
                                How to use postgres numeric range with SQLAlchemy
                            
                                How to serialize a tree class object structure into json file format?
                            
                                Python: Log in a website using urllib
                            
                                iterate through list like in sliding window
                            
                                how to create an list of a specific type but empty
                            
                                In pandas, why does tz_convert change the timezone used from EST to LMT?
                            
                                How to store or read a literal carriage return and newline from yaml in python
                            
                                How can I determine if a test passed or failed by examining the Item object passed to the pytest_runtest_teardown?
                            
                                Setting up IPython Qtconsole with PyQt5
                            
                                Retrieve exit code of processes launched with multiprocessing.Pool.map
                            
                                BeautifulSoup Object Will Not Pickle, Causes Interpreter to Silently Crash
                            
                                Instagram API OauthException: "You must provide a client_id"
                            
                                wagtail pages giving `none` url with `live` status
                            
                                Using Radial Basis Functions to Interpolate a Function on a Sphere
                            
                                How to test for uniformity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With