Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3.3: .format() with unicode format_spec

I have datetime object and my users provide their own format string to format the time in the way they like.

One way I find is to use '{:...}'.format(mydatetime).

lt = time.localtime(time.time())
d = datetime. datetime.fromtimestamp(time.mktime(lt))
print(userString.format(datetime=d))

English users may provide '{datetime:%B %d, %Y}', which formats to December 24, 2013.

Chinese users may provide '{datetime:%Y年%m月%d日}' (in YYYYMMDD format, 年=Year, 月=Month, 日=Day).

But when executing '{datetime:%Y年%m月%d日}'.format(datetime=d), Python raises UnicodeEncodingError: 'locale' codec can't encode character '\u5e74' in position 2: Illegal byte sequence

I know there is a workaround that I can tell my Chinese users to give format string like '{datetime:%Y}年{datime:%m}月{datetime:%d}日', but cannot unicode character show in format_spec? How to solve this problem?

I'm using Windows.

Thanks

like image 572
Gqqnbig Avatar asked Dec 24 '13 14:12

Gqqnbig


People also ask

What is format () in Python?

The format() method formats the specified value(s) and insert them inside the string's placeholder. The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below. The format() method returns the formatted string.

Does Python 3 have Unicode?

In Python3, the default string is called Unicode string (u string), you can understand them as human-readable characters. As explained above, you can encode them to the byte string (b string), and the byte string can be decoded back to the Unicode string.

How do I format a string in Python 3?

Method 2: Formatting string using format() method Format() method was introduced with Python3 for handling complex string formatting more efficiently. Formatters work by putting in one or more replacement fields and placeholders defined by a pair of curly braces { } into a string and calling the str. format().

How do you use %s in Python?

The %s operator is put where the string is to be specified. The number of values you want to append to a string should be equivalent to the number specified in parentheses after the % operator at the end of the string value. The following Python code illustrates the way of performing string formatting.


1 Answers

datetime.__format__ calls datetime.strftime, which does some preprocessing and then calls time.strftime (CPython 3.3.3 source).

On Windows, time.strftime uses the C runtime's multibyte-string function strftime instead of the wide-character string function wcsftime. First it has to encode the format string according to the current locale by calling PyUnicode_EncodeLocale. This in turn calls the CRT function wcstombs (MSDN), which uses the currently configured locale for the LC_CTYPE category. If the process is currently using the default "C" locale, wcstombs converts Latin-1 (codes < 256) directly to bytes, and anything else is an EILSEQ error, i.e. "Illegal byte sequence".

Use the locale module to set a new locale. The actual locale names vary by platform, but with Microsoft's setlocale you should be able to just set a language string and use the default codepage for the given language. Generally you shouldn't mess with this for a library, and an application should configure the locale at startup. For example:

>>> import datetime, locale

>>> oldlocale = locale.setlocale(locale.LC_CTYPE, None)
>>> oldlocale
'C'
>>> newlocale = locale.setlocale(locale.LC_CTYPE, 'chinese')

>>> d = datetime.datetime.now()
>>> '{datetime:%Y\\u5e74%m\\u6708%d\\u65e5}'.format(datetime=d)
'2013\\u5e7412\\u670825\\u65e5'

If you want the formatted time to use locale-specific names (e.g. month and day), then also set the LC_TIME category:

>>> newlocale = locale.setlocale(locale.LC_TIME, 'chinese')
>>> '{datetime:%B %d, %Y}'.format(datetime=d)              
'\u5341\u4e8c\u6708 25, 2013'
like image 145
Eryk Sun Avatar answered Oct 01 '22 18:10

Eryk Sun