Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python write unicode to file easily?

I want to make sure all string are unicode in my code, so I use unicode_literals, then I need to write string to file:

from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
    f.write("中文") # UnicodeEncodeError

so I need to do this:

from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))

but every time I need to encode in code, I am lazy, so I change to codecs:

from __future__ import unicode_literals
from codecs import open
import locale, codecs
lang, encoding = locale.getdefaultlocale()

with open('/tmp/test', 'wb', encoding) as f:
    f.write("中文")

still I think this is too much if I just want to write to file, any easier method?

like image 849
roger Avatar asked Jan 29 '16 06:01

roger


People also ask

How do you write Unicode characters in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.

What can you use to encode the Unicode text and then write to the text file?

encode method does, and the result of encoding a unicode string is a bytestring (a str type.) You should either use normal open() and encode the unicode yourself, or (usually a better idea) use codecs. open() and not encode the data yourself.


1 Answers

You don't need to call .encode() and you don't need to call locale.getdefaultlocale() explicitly:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import io

with io.open('/tmp/test', 'w') as file:
    file.write(u"中文" * 4)

It uses locale.getpreferredencoding(False) character encoding to save Unicode text to the file.

On Python 3:

  • you don't need to use the explicit encoding declaration (# -*- coding: utf-8 -*-), to use literal non-ascii characters in your Python source code. utf-8 is the default.

  • you don't need to use import io: builtin open() is io.open() there

  • you don't need to use u'' (u prefix). '' literals are Unicode by default. If you want to omit u'' then put back from __future__ import unicode_literals as in your code in the question.

i.e., the complete Python 3 code is:

#!/usr/bin/env python3

with open('/tmp/test', 'w') as file:
    file.write("中文" * 4)
like image 58
jfs Avatar answered Oct 21 '22 14:10

jfs