Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError when using the compile function

Using python 3.2 in Windows 7 I am getting the following in IDLE:

>>compile('pass', r'c:\temp\工具\module1.py', 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

Can anybody explain why the compile statement tries to convert the unicode filename using mbcs? I know that sys.getfilesystemencoding returns 'mbcs' in Windows, but I thought that this is not used when unicode file names are provided.

for example:

f = open(r'c:\temp\工具\module1.py') 

works.

For a more complete test save the following in a utf8 encoded file and run it using the standard python.exe version 3.2

# -*- coding: utf8 -*-
fname = r'c:\temp\工具\module1.py'
# I do have the a file named fname but you can comment out the following two lines
f = open(fname)
print('ok')
cmp = compile('pass', fname, 'exec')
print(cmp)

Output:

ok
Traceback (most recent call last):
  File "module8.py", line 6, in <module>
    cmp = compile('pass', fname, 'exec')
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character
like image 230
PyScripter Avatar asked Jan 10 '12 04:01

PyScripter


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

Why does Unicode error occur in Python?

When we use such a string as a parameter to any function, there is a possibility of the occurrence of an error. Such error is known as Unicode error in Python. We get such an error because any character after the Unicode escape sequence (“ \u ”) produces an error which is a typical error on windows.

What is Unicodeescape error in Python?

The Python "SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position" occurs when we have an unescaped backslash character in a path. To solve the error, prefix the path with r to mark it as a raw string, e.g. r'C:\Users\Bob\Desktop\example.

What is Unicode decode error?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.


2 Answers

From Python issue 10114, it seems that the logic is that all filenames used by Python should be valid for the platform where they are used. It is encoded using the filesystem encoding to be used in the C internals of Python.

I agree that it probably shouldn't throw an error on Windows, because any Unicode filename is valid. You may wish to file a bug report with Python for this. But be aware that the necessary changes might not be trivial, because any C code using the filename has to have something to do if it can't be encoded.

like image 169
Thomas K Avatar answered Oct 23 '22 07:10

Thomas K


Here a solution that worked for me: Issue 427: UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-6: ordinal not in range (128):

If you look the PyScripter help file in the topic "Encoded Python Source Files" (last paragraph) it tells you how to configure Python to support other encodings by modifying the site.py file. This file is in the lib subdirectory of the Python installation directory. Find the function setencoding and make sure that the support locale aware default string encodings is on. (see below)

def setencoding():
  """Set the string encoding used by the Unicode implementation.  The
  default is 'ascii', but if you're willing to experiment, you can
  change this."""
  encoding = "ascii" # Default value set by _PyUnicode_Init()
  if 0:  <<<--- set this to 1 ---------------------------------
      # Enable to support locale aware default string encodings.
      import locale
      loc = locale.getdefaultlocale ()
      if loc[1]:
          encoding = loc[1]
  if 0:
      # Enable to switch off string to Unicode coercion and implicit
      # Unicode to string conversion.
      encoding = "undefined"
  if encoding != "ascii":
      # On Non-Unicode builds this will raise an AttributeError...
      sys.setdefaultencoding (encoding) # Needs Python Unicode
build !
like image 38
Framester Avatar answered Oct 23 '22 07:10

Framester