Switching to Python 3 causing UnicodeDecodeError

Tags:

I've just added Python3 interpreter to Sublime, and the following code stopped working:

for directory in directoryList:     fileList = os.listdir(directory)     for filename in fileList:         filename = os.path.join(directory, filename)         currentFile = open(filename, 'rt')         for line in currentFile:               ##Here comes the exception.             currentLine = line.split(' ')             for word in currentLine:                 if word.lower() not in bigBagOfWords:                     bigBagOfWords.append(word.lower())         currentFile.close()

I get a following exception:

  File "/Users/Kuba/Desktop/DictionaryCreator.py", line 11, in <module>     for line in currentFile:   File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode     return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 305: ordinal not in range(128)

I found this rather strange, because as far as I know Python3 is supposed to support utf-8 everywhere. What's more, the same exact code works with no problems on Python2.7. I've read about adding environmental variable PYTHONIOENCODING, but I tried it - to no avail (however, it appears it is not that easy to add an environmental variable in OS X Mavericks, so maybe I did something wrong with adding the variable? I modidified /etc/launchd.conf)

813

asked May 28 '14 17:05

3yakuya

1 Answers

Python 3 decodes text files when reading, encodes when writing. The default encoding is taken from locale.getpreferredencoding(False), which evidently for your setup returns 'ASCII'. See the open() function documenation:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Instead of relying on a system setting, you should open your text files using an explicit codec:

currentFile = open(filename, 'rt', encoding='latin1')

where you set the encoding parameter to match the file you are reading.

Python 3 supports UTF-8 as the default for source code.

The same applies to writing to a writeable text file; data written will be encoded, and if you rely on the system encoding you are liable to get UnicodeEncodingError exceptions unless you explicitly set a suitable codec. What codec to use when writing depends on what text you are writing and what you plan to do with the file afterward.

You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.

answered Sep 20 '22 04:09

Martijn Pieters

Related questions
                            
                                Setting HTTP status code in Bottle?
                            
                                How to run specific test cases from a test suite using Robot Framework
                            
                                Split a string by backslash in python
                            
                                Return self in python [closed]
                            
                                How to use slugify in Python 3?
                            
                                Python: Multicore processing?
                            
                                matching unicode characters in python regular expressions
                            
                                Django for social networking [closed]
                            
                                Broken Pipe error when using pip to install pycrypto on Mac OS X
                            
                                How to use getopt/OPTARG in Python? How to shift arguments if too many arguments (9) are given?
                            
                                how to create class variable dynamically in python
                            
                                How to pass arguments to the metaclass from the class definition?
                            
                                How to display html content through flask messages?
                            
                                Find broken symlinks with Python
                            
                                Would Python make a good substitute for the Windows command-line/batch scripts?
                            
                                How can I convert canvas content to an image?
                            
                                Python list comprehension, unpacking and multiple operations
                            
                                Equality in Pandas DataFrames - Column Order Matters?
                            
                                Setting Transparency Based on Pixel Values in Matplotlib
                            
                                Python pip unable to locate pyodbc

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Switching to Python 3 causing UnicodeDecodeError

Tags:

python

python-3.x

encoding

3yakuya

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us