Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

backslash in a character set of a python regexp (how to specify 'not a backslash' character set)?

Tags:

python

regex

I want to use a python regexp to remove the comments in a LaTeX file. In LaTeX a comment starts by "%". But if the % character is escaped ("\%") then its not a comment, its the symbol percent.

This task is just one among many regexp that I apply on my LaTeX text. I store all these reg exp in a list of dicts.

The problem I face is that the regexp I use for pruning the comments does not work (because I do not know how to specify the character set 'not backslash'). The backslash in the character set escapes the closing ']' and the regexp is incorrect.

My code:

regexps=[]
regexps.append({r'left':'%.*', 'right':r''}) # this strips all the comments, but messes up with the percent characters (\%)
regexps.append({r'left':'[^\]%.*', 'right':r''}) # this is incorrect (escapes the closing "]" )
return applyRegexps(latexText, regexps)


def applyRegexps(text, listRegExp):
    """ Applies successively many regexps to a text"""
    if testMode:
        print str(listRegExp)
    # apply all the regexps in the list
    for element in listRegExp:
        left = element['left']
        right = element['right']
        r=re.compile(left)
        text=r.sub(right,text)
    return text

Any help will be much appreciated. Thanks!

Gilles

like image 871
user1821466 Avatar asked Nov 13 '12 17:11

user1821466


1 Answers

Simply double the backslash, but do use a raw string literal to avoid having to double them again:

regexps.append({'left':r'[^\\]%.*', 'right':r''})
like image 196
Martijn Pieters Avatar answered Sep 28 '22 14:09

Martijn Pieters