Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Nothing to repeat" from Python regex

Tags:

python

regex

Here is a regex - attempted by egrep and then by Python 2.7:

$ echo '/some/path/to/file/abcde.csv' | egrep '*([a-zA-Z]+).csv'

/some/path/to/file/abcde.csv

However, the same regex in Python:

re.match(r'*([a-zA-Z]+)\.csv',f )

Gives:

Traceback (most recent call last):
  File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
    hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

Doing a search reveals there appears to be a Python bug in play here:

regex error - nothing to repeat

It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit.

However, it is not clear to me: what then is the workaround for my regex shown above - to make python happy?

Thanks.

like image 587
WestCoastProjects Avatar asked Jul 13 '15 14:07

WestCoastProjects


People also ask

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"

WHAT IS RE sub in Python?

sub() function belongs to the Regular Expressions ( re ) module in Python. It returns a string where all matching occurrences of the specified pattern are replaced by the replace string. To use this function, we need to import the re module first. import re.

Does Python match regex?

The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, the Python RegEx Match function returns null.

What is re escape?

re.escape( text ) Return a string with a backslash character \ inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.


1 Answers

You do not need the * in the pattern, it causes the issue because you are trying to quantify the beginning of the pattern, but there is nothing, an empty string, to quantify.

The same "Nothing to repeat" error occurs when you

  • Place any quantifier (+, ?, *, {2}, {4,5}, etc.) at the start of the pattern (e.g. re.compile(r'?'))
  • Add any quantifier right after ^ / \A start of string anchor (e.g. re.compile(r'^*'))
  • Add any quantifier right after $ / \Z end of string anchor (e.g. re.compile(r'$*'))
  • Add any quantifier after a word boundary (e.g.re.compile(r'\b*\d{5}'))

Note, however, that in Python re, you may quantify any lookaround, e.g. (?<!\d)*abc and (?<=\d)?abc will yield the same matches since the lookarounds are optional.

Use

([a-zA-Z]+)\.csv

Or to match the whole string:

.*([a-zA-Z]+)\.csv

See demo

The reason is that * is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.

If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [ and ] inside a character class like [([)]]).

like image 163
Wiktor Stribiżew Avatar answered Oct 07 '22 13:10

Wiktor Stribiżew