Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persist UTF-8 as Default Encoding

Tags:

python

utf-8

utf

I tried to persist UTF-8 as the default encoding in Python.

I tried:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

And I also tried:

>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('UTF8')
>>> sys.getdefaultencoding()
'UTF8'
>>> 

But after closing the session and opening a new session, the following was the result:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

How can I persist my changes? (I know that it's not always a good idea to change to UTF-8. It's in a Docker container of Python).

I know it's possible. I saw someone who has UTF-8 as his default encoding (always).

like image 530
DenCowboy Avatar asked Apr 14 '16 12:04

DenCowboy


3 Answers

First, this is almost certainly a bad idea, since code will mysteriously break if you run it on a different machine where this configuration hasn't been done.

(1) Create a new file like this (mine's called setEncoding.py):

import sys
# reload because Python removes setdefaultencoding() from the namespace
# see http://stackoverflow.com/questions/2276200/changing-default-encoding-of-python
reload(sys)
sys.setdefaultencoding("utf-8")

(2) set the environment variable [PYTHONSTARTUP][1] to point at this file.

(3) When the Python interpreter is loaded, the code inside the file that PYTHONSTARTUP points at will be executed first:

bgporter@Ornette ~/temp:python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> sys.getdefaultencoding()
'utf-8'
>>>
like image 184
bgporter Avatar answered Sep 28 '22 09:09

bgporter


Please take a look into site.py library - it is the place where sys.setdefaultencoding happens. You could, I think, modify or substitute this module in order to make it permanent on your machine. Here is some of it's source code, comments explains something:

def setencoding():
    """Set the string encoding used by the Unicode implementation.  The
    default is 'ascii', but if you're willing to experiment, you can
    change this."""

    encoding = "ascii" # Default value set by _PyUnicode_Init()
    if 0:
        # Enable to support locale aware default string encodings.
        import locale
        loc = locale.getdefaultlocale()
        if loc[1]:
            encoding = loc[1]
    if 0:
        # Enable to switch off string to Unicode coercion and implicit
        # Unicode to string conversion.
        encoding = "undefined"
    if encoding != "ascii":
        # On Non-Unicode builds this will raise an AttributeError...
        sys.setdefaultencoding(encoding) # Needs Python Unicode build !

Full source https://hg.python.org/cpython/file/2.7/Lib/site.py.

This is the place where they delete the sys.setdefaultencoding function, if you were wondering:

def main():

    ...

    # Remove sys.setdefaultencoding() so that users cannot change the
    # encoding after initialization.  The test for presence is needed when
    # this module is run as a script, because this code is executed twice.
    if hasattr(sys, "setdefaultencoding"):
        del sys.setdefaultencoding
like image 38
user2622016 Avatar answered Sep 28 '22 09:09

user2622016


You can always add at the top of your python files:

# -*- coding: utf-8 -*-

Which will in *nix systems change the encoding to utf-8 for that file.

like image 26
cmaceachern Avatar answered Sep 28 '22 08:09

cmaceachern