I have a regex to detect invalid xml 1.0 characters in a unicode string:
bad_xml_chars = re.compile(u'[^\x09\x0A\x0D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]', re.U)
On Linux/python2.7, this works perfectly. On windows the following is raised:
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
Any ideas why this isn't compiling on Windows?
compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re.
match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.
If a Regex object is constructed with the RegexOptions. Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions.
Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).
You have a narrow Python build on Windows, so Unicode uses UTF-16. This means that Unicode characters higher than \uFFFF
will be two separate characters in the Python string. You should see something like this:
>>> len(u'\U00010000')
2
>>> u'\U00010000'[0]
u'\ud800'
>>> u'\U00010000'[1]
u'\udc00'
Here is how the regex engine will attempt to interpret your string on narrow builds:
[^\x09\x0A\x0D\u0020-\ud7ff\ue000-\ufffd\ud800\udc00-\udbff\udfff]
You can see here that \udc00-\udbff
is where the invalid range message is coming from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With