Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bogus escape error when running regex

Tags:

python

regex

Yesterday, I got help with the regex matching which worked well as a standalone. But when put into this code, I am getting the "bogus escape error". The code and traceback are below. Could you please point me to what I am doing wrong?

#!/usr/bin/env python
import re

sf = open("a.txt","r")
out = open("b.txt","w")
regex = re.compile(r'Merging\s+\d+[^=]*=\s*\'\w+@\w+\x\w+\'\\"')


for line in sf:
    m = regex.findall(line)
    for i in m:
       print >> out,line,

The traceback is:

Traceback (most recent call last): File "match.py", line 6, in <module> regex = re.compile(r'Merging\s+\d+[^=]*=\s*\'\w+@\w+\x\w+\'\\"') File "/usr/lib/python2.7/re.py", line 190, in compile return _compile(pattern, flags) File "/usr/lib/python2.7/re.py", line 242, in _compile raise error, v # invalid expression sre_constants.error: bogus escape: '\\x'

like image 476
BRZ Avatar asked Aug 20 '14 15:08

BRZ


2 Answers

\x is not a valid special sequence. If you want to match a literal \x, you need to escape the backslash using \\x or if you need something else, use a valid one, such as you did with \w.

This will compile:

re.compile(r'Merging\s+\d+[^=]*=\s*\'\w+@\w+\\x\w+\'\\"')
like image 54
Seb D. Avatar answered Oct 18 '22 00:10

Seb D.


\x must be followed by a hex value (i.e. exactly two hex digits):

>>> '\x61'
'a'
>>> '\x'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

If you want to match a literal \x then you can escape the backslash so that the x is not being escaped: \\x.

like image 2
arshajii Avatar answered Oct 18 '22 01:10

arshajii