Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.finditer error at special characters

Tags:

python-3.x

I have code that loops through a list of strings, then loops through each occurrence of that string inside another string. This seems to work until it reaches a string that begins with a question mark (? I).

This is the code.

dtID = 0
for datum in sorted(datumList, key=operator.attrgetter('Sum'), reverse = True):
    datum.ID = dtID
    for foundDatum in re.finditer(datum.Name, text):
        datumLocList.append(DatumLoc(dtID,foundDatum.start()))
    dtID += 1

How can I solve this?

Traceback (most recent call last):
  File "C:\Users\trist\Documents\Python\The Compressor\The Compressor.py", line 97, in <module>
    compress()
  File "C:\Users\trist\Documents\Python\The Compressor\The Compressor.py", line 73, in compress
    for foundDatum in re.finditer(datum.Name, text):
  File "C:\Program Files\Python37\lib\re.py", line 230, in finditer
    return _compile(pattern, flags).finditer(string)
  File "C:\Program Files\Python37\lib\re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Program Files\Python37\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Program Files\Python37\lib\sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Program Files\Python37\lib\sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "C:\Program Files\Python37\lib\sre_parse.py", line 651, in _parse
    source.tell() - here + len(this))
re.error: nothing to repeat at position 0
like image 797
Tristan King Avatar asked Jul 03 '26 04:07

Tristan King


1 Answers

Your pattern string containing a question mark is being treated as a regular expression special character. The ? symbol attempts to match 0 or 1 repetitions of the preceding regular expression. Since in your string the ? is the first character, it is attempting to match 0 or 1 repetitions of the preceding regular expression, which is nothing: hence your 'nothing to repeat at position 0' error.

To avoid this you can use the re.escape() method which will escape all RE special characters in your pattern string.

for foundDatum in re.finditer(re.escape(datum.Name), text):
    datumLocList.append(DatumLoc(dtID,foundDatum.start()))

See https://docs.python.org/3/library/re.html#re.escape

like image 185
Robbie Dunn Avatar answered Jul 05 '26 05:07

Robbie Dunn