Can I change default open()
(io.open()
in 2.7) text encoding in a cross-platform way?
So that I didn't need to specify each time open(...,encoding='utf-8')
.
In text mode, if encoding is not specified the encoding used is platform dependent:
locale.getpreferredencoding(False)
is called to get the current locale encoding.
Though documentation doesn't specify how to set preferred encoding. The function is in locale
module, so I need to change locale? Is there any reliable cross-platform way to set UTF-8 locale? Will it affect anything else other than the default text file encoding?
Or locale changes are dangerous (can break something), and I should stick to custom wrapper such as:
def uopen(*args, **kwargs):
return open(*args, encoding='UTF-8', **kwargs)
This is not a safe thing to do, though: this is obviously a hack, since sys.setdefaultencoding () is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).
The default encoding in str.encode () and bytes.decode () is UTF-8. There is one other property that is more nuanced, which is that the default encoding to the built-in open () is platform-dependent and depends on the value of locale.getpreferredencoding ():
Encoding and Decoding in Python 3. Python 3’s str type is meant to represent human-readable text and can contain any Unicode character. The bytes type, conversely, represents binary data, or sequences of raw bytes, that do not intrinsically have an encoding attached to it. Encoding and decoding is the process of going from one to the other:
Encoded Unicode text is represented as binary data ( bytes ). The str type can contain any literal Unicode character, such as "Δv / Δt", all of which will be stored as Unicode. Python 3 accepts many Unicode code points in identifiers, meaning résumé = "~/Documents/resume.pdf" is valid if this strikes your fancy.
Don't change the locale or preferred encoding because;
open
using a specific encoding.Instead, use a simple wrapper:
from functools import partial
open_utf8 = partial(open, encoding='UTF-8')
You can also specify defaults for all keyword arguments (should you need to).
you can set the encoding ... but its really hacky
import sys
sys.getdefaultencoding() #should print your default encoding
sys.setdefaultencoding("utf8") #error ... no setdefaultencoding ... but...
reload(sys)
sys.setdefaultencoding("utf8") #now it succeeds ...
I would instead do
main_script.py
import __builtin__
old_open = open
def uopen(*args, **kwargs):
return open(*args, encoding='UTF-8', **kwargs)
__builtin__.open = uopen
then anywhere you call open
it will use the utf8 encoding ... however it may give you errors if you explicitly add an encoding
or just explicitly pass the encoding any time you open a file , or use your wrapper ...
pythons general philosophy is explicit is better than implicit, which implies the "right" solution is to explicitly declare your encoding when opening a file ...
If you really need to change the default encoding, you can replace the built-in open
function.
original_open = __builtins__.open
def uopen(*args, **kwargs):
if "b" not in (args[1] if len(args) >= 2 else kwargs.get("mode", "")):
kwargs.setdefault("encoding", "UTF-8")
return original_open(*args, **kwargs)
__builtins__.open = uopen
I wrote and tested this snipped after I found this mails about replacing print
on a mailing list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With