Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to change Python's open() default text encoding?

Can I change default open() (io.open() in 2.7) text encoding in a cross-platform way?

So that I didn't need to specify each time open(...,encoding='utf-8').

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Though documentation doesn't specify how to set preferred encoding. The function is in locale module, so I need to change locale? Is there any reliable cross-platform way to set UTF-8 locale? Will it affect anything else other than the default text file encoding?

Or locale changes are dangerous (can break something), and I should stick to custom wrapper such as:

def uopen(*args, **kwargs):
    return open(*args, encoding='UTF-8', **kwargs)
like image 989
user Avatar asked Jul 22 '14 20:07

user


People also ask

Is it safe to re-enable default encoding in Python?

This is not a safe thing to do, though: this is obviously a hack, since sys.setdefaultencoding () is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).

What is the default UTF-8 encoding in Python?

The default encoding in str.encode () and bytes.decode () is UTF-8. There is one other property that is more nuanced, which is that the default encoding to the built-in open () is platform-dependent and depends on the value of locale.getpreferredencoding ():

What is encoding and decoding in Python 3?

Encoding and Decoding in Python 3. Python 3’s str type is meant to represent human-readable text and can contain any Unicode character. The bytes type, conversely, represents binary data, or sequences of raw bytes, that do not intrinsically have an encoding attached to it. Encoding and decoding is the process of going from one to the other:

What is encoded Unicode text in Python?

Encoded Unicode text is represented as binary data ( bytes ). The str type can contain any literal Unicode character, such as "Δv / Δt", all of which will be stored as Unicode. Python 3 accepts many Unicode code points in identifiers, meaning résumé = "~/Documents/resume.pdf" is valid if this strikes your fancy.


3 Answers

Don't change the locale or preferred encoding because;

  • it may affect other parts of your code (or the libraries you're using); and
  • it wont be clear that your code depends on open using a specific encoding.

Instead, use a simple wrapper:

from functools import partial
open_utf8 = partial(open, encoding='UTF-8')

You can also specify defaults for all keyword arguments (should you need to).

like image 160
Peter Sutton Avatar answered Oct 06 '22 20:10

Peter Sutton


you can set the encoding ... but its really hacky

import sys
sys.getdefaultencoding() #should print your default encoding
sys.setdefaultencoding("utf8") #error ... no setdefaultencoding ... but...
reload(sys)
sys.setdefaultencoding("utf8")  #now it succeeds ...

I would instead do

main_script.py

import __builtin__
old_open = open
def uopen(*args, **kwargs):
    return open(*args, encoding='UTF-8', **kwargs)
__builtin__.open = uopen

then anywhere you call open it will use the utf8 encoding ... however it may give you errors if you explicitly add an encoding

or just explicitly pass the encoding any time you open a file , or use your wrapper ...

pythons general philosophy is explicit is better than implicit, which implies the "right" solution is to explicitly declare your encoding when opening a file ...

like image 23
Joran Beasley Avatar answered Oct 06 '22 19:10

Joran Beasley


If you really need to change the default encoding, you can replace the built-in open function.

original_open = __builtins__.open
def uopen(*args, **kwargs):
    if "b" not in (args[1] if len(args) >= 2 else kwargs.get("mode", "")):
        kwargs.setdefault("encoding", "UTF-8")
    return original_open(*args, **kwargs)
__builtins__.open = uopen

I wrote and tested this snipped after I found this mails about replacing print on a mailing list.

like image 1
JojOatXGME Avatar answered Oct 06 '22 20:10

JojOatXGME