Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 Default Encoding cp1252

I recently ran into some problems decoding a handle (with errors mapping 0x81, 0x8D) from the Biopython module with an anaconda 4.1.1 python 3.5.2 installation on a sony vaio windows 10 system

After some research, it seems that possibly the problem may be that the default decoding codec is cp1252. I ran the code below and found that indeed the default codec is set to cp1252.

However, several posts suggest that python 3 should have set the default codec to utf8. Is that correct? If so, why is mine cp1252 and how can I solve this? import locale os_encoding = locale.getpreferredencoding()

like image 719
Mike Avatar asked Feb 06 '17 14:02

Mike


People also ask

What is Python 3 default encoding?

By default in Python 3, we are on the left side in the world of Unicode code points for strings. We only need to go back and forth with bytes while writing or reading the data. Default encoding during this conversion is UTF-8, but other encodings can also be used.

What is encoding =' cp1252?

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.

What is encoding standards in Python?

String EncodingThe process is known as encoding. There are various encodings present which treat a string differently. The popular encodings being utf-8, ascii, etc. Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.


1 Answers

According to What’s New In Python 3.0,

There is a platform-dependent default encoding […] In many cases, but not all, the system default is UTF-8; you should never count on this default.

and

PEP 3120: The default source encoding is now UTF-8.

In other words, Python opens source files as UTF-8 by default, but any interaction with the filesystem will depend on the environment. It's strongly recommended to use open(filename, encoding='utf-8') to read a file.

Another change is that b'bytes'.decode() and 'str'.encode() with no argument use utf-8 instead of ascii.

Python 3.6 changes some more defaults:

PEP 529: Change Windows filesystem encoding to UTF-8

PEP 528: Change Windows console encoding to UTF-8

But the default encoding for open() is still whatever Python manages to infer from the environment.

It appears that 3.7 will add an (opt-in!) mode where the environmental locale encoding is ignored, and everything is all UTF-8 all the time (except for specific cases where Windows uses UTF-16, I suppose). See PEP 0540 and corresponding Issue 29240.

like image 150
Josh Lee Avatar answered Sep 29 '22 02:09

Josh Lee