"Nothing to repeat" from Python regex

Tags:

regex

Here is a regex - attempted by egrep and then by Python 2.7:

$ echo '/some/path/to/file/abcde.csv' | egrep '*([a-zA-Z]+).csv'

/some/path/to/file/abcde.csv

However, the same regex in Python:

re.match(r'*([a-zA-Z]+)\.csv',f )

Gives:

Traceback (most recent call last):
  File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
    hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

Doing a search reveals there appears to be a Python bug in play here:

regex error - nothing to repeat

It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit.

However, it is not clear to me: what then is the workaround for my regex shown above - to make python happy?

Thanks.

587

asked Jul 13 '15 14:07

1 Answers

You do not need the * in the pattern, it causes the issue because you are trying to quantify the beginning of the pattern, but there is nothing, an empty string, to quantify.

The same "Nothing to repeat" error occurs when you

Place any quantifier (+, ?, *, {2}, {4,5}, etc.) at the start of the pattern (e.g. re.compile(r'?'))
Add any quantifier right after ^ / \A start of string anchor (e.g. re.compile(r'^*'))
Add any quantifier right after $ / \Z end of string anchor (e.g. re.compile(r'$*'))
Add any quantifier after a word boundary (e.g.re.compile(r'\b*\d{5}'))

Note, however, that in Python re, you may quantify any lookaround, e.g. (?<!\d)*abc and (?<=\d)?abc will yield the same matches since the lookarounds are optional.

Use

([a-zA-Z]+)\.csv

Or to match the whole string:

.*([a-zA-Z]+)\.csv

See demo

The reason is that * is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.

If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [ and ] inside a character class like [([)]]).

163

answered Oct 07 '22 13:10

Wiktor Stribiżew

Related questions
                            
                                AttributeError: 'EditForm' object has no attribute 'validate_on_submit'
                            
                                How to use different database engines in Django for testing and production
                            
                                Python requests library Exception handling
                            
                                How can i vectorize list using sklearn DictVectorizer
                            
                                SciPy medfilt wrong result
                            
                                Python NUMPY HUGE Matrices multiplication
                            
                                Using sqlalchemy to query using multiple column where in clause
                            
                                Numpy zip function
                            
                                Difficulty with Celery: function object has no property 'delay'
                            
                                Attributes in Xpath local-name()
                            
                                Using threads and processes together with shared queues in Python
                            
                                Negative integer to signed 32-bit binary
                            
                                Format date without dash?
                            
                                How to find words ending with ing
                            
                                Python: installed selenium package not detected
                            
                                Can python build android apps?
                            
                                Scrollbar to scroll Text widget, using Grid layout, in Tkinter
                            
                                "invalid literal for int() with base 10:" What does this actually mean?
                            
                                Opposite of to_python in custom Django form field?
                            
                                How does Flask-SQLAlchemy create_all discover the models to create?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Nothing to repeat" from Python regex

Tags:

python

regex

WestCoastProjects

People also ask

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us