Here is a regex - attempted by egrep and then by Python 2.7:
$ echo '/some/path/to/file/abcde.csv' | egrep '*([a-zA-Z]+).csv'
/some/path/to/file/abcde.csv
However, the same regex in Python:
re.match(r'*([a-zA-Z]+)\.csv',f )
Gives:
Traceback (most recent call last):
File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
File "/usr/lib/python2.7/re.py", line 141, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
Doing a search reveals there appears to be a Python bug in play here:
regex error - nothing to repeat
It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit.
However, it is not clear to me: what then is the workaround for my regex shown above - to make python happy?
Thanks.
- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"
sub() function belongs to the Regular Expressions ( re ) module in Python. It returns a string where all matching occurrences of the specified pattern are replaced by the replace string. To use this function, we need to import the re module first. import re.
The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, the Python RegEx Match function returns null.
re.escape( text ) Return a string with a backslash character \ inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.
You do not need the *
in the pattern, it causes the issue because you are trying to quantify the beginning of the pattern, but there is nothing, an empty string, to quantify.
The same "Nothing to repeat
" error occurs when you
+
, ?
, *
, {2}
, {4,5}
, etc.) at the start of the pattern (e.g. re.compile(r'?')
)^
/ \A
start of string anchor (e.g. re.compile(r'^*')
)$
/ \Z
end of string anchor (e.g. re.compile(r'$*')
)re.compile(r'\b*\d{5}')
)Note, however, that in Python re
, you may quantify any lookaround, e.g. (?<!\d)*abc
and (?<=\d)?abc
will yield the same matches since the lookarounds are optional.
Use
([a-zA-Z]+)\.csv
Or to match the whole string:
.*([a-zA-Z]+)\.csv
See demo
The reason is that *
is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.
If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [
and ]
inside a character class like [([)]]
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With