Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

locale.getpreferredencoding() - why does this reset string.letters?

Tags:

>>> import string
>>> import locale
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> locale.getpreferredencoding()
'UTF-8'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

Any workarounds for this?

Platform: Linux Python2.6.7 and Python2.7.3 seem to be affected, Works fine in Python3 (with ascii_letters)

like image 209
Anthony Sottile Avatar asked May 19 '14 16:05

Anthony Sottile


People also ask

How many code examples of locale getpreferredencoding are there?

The following are 30 code examples of locale.getpreferredencoding () . These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.

What is the default locale in Python?

By default, Python tries to honor the Unix locale system, including the LC_ALL, LC_CTYPE, and LANG environment variables. In theory, standards are good, but in my experience these variables only cause problems.

How to decode a paraemeter with no encoding?

def decode_as_string(text, encoding=None): """ Decode the console or file output explicitly using getpreferredencoding. The text paraemeter should be a encoded string, if not no decode occurs If no encoding is given, getpreferredencoding is used. If encoding is specified, that is used instead.


1 Answers

Note: what OP did to solve the issue is to pass encoding='UTF-8' to the open call. If you run into this issue and are just looking for a fix this works. The rest of the post is an emphasis on why.


What happens

As Lukas said, the docs specify:

On some systems, it is necessary to invoke setlocale() to obtain the user preferences

Initially, string.letters is set to returning lowercase + uppercase:

lowercase = 'abcdefghijklmnopqrstuvwxyz'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
letters = lowercase + uppercase

However, when you call getpreferredencoding(), the _locale module overrides it by calling PyDict_SetItemString(string, "letters", ulo); after it generates them inside fixup_ulcase(void) with the following:

/* create letters string */
n = 0;
for (c = 0; c < 256; c++) {
    if (isalpha(c))
        ul[n++] = c;
}
ulo = PyString_FromStringAndSize((const char *)ul, n);
if (!ulo)
    return;
if (string)
    PyDict_SetItemString(string, "letters", ulo);
Py_DECREF(ulo);

In turn, this is called in PyLocale_setlocale which is indeed setlocale, which is called by getpreferredencoding - code here http://hg.python.org/cpython/file/07a6fca7ff42/Lib/locale.py#l612 :

  def getpreferredencoding(do_setlocale = True):
        """Return the charset that the user is likely using,
        according to the system configuration."""
        if do_setlocale:
            oldloc = setlocale(LC_CTYPE)
            try:
                setlocale(LC_CTYPE, "")
            except Error:
                pass
            result = nl_langinfo(CODESET)
            setlocale(LC_CTYPE, oldloc)
            return result
        else:
            return nl_langinfo(CODESET)

How do I avoid it?

Try getpreferredencoding(False)

Why does it not happen in windows?

Windows uses different code for getting the locale, as you can see here.

In Python 3

In Python 3, getdefaultlocale does not accept a boolean setlocale variable and does not call setlocale itself as you can see here.

like image 117
Benjamin Gruenbaum Avatar answered Sep 29 '22 16:09

Benjamin Gruenbaum