Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, how to print Japanese, Korean, Chinese strings

In Python, for Japanese, Chinese, and Korean,Python can not print the correct strings, for example hello in Japanese, Korean and Chinese are:

こんにちは
안녕하세요
你好

And print these strings:

In [1]: f = open('test.txt')

In [2]: for _line in f.readlines():
   ...:     print(_line)
   ...:     
こんにちは

안녕하세요

你好


In [3]: f = open('test.txt')

In [4]: print(f.readlines())
[ '\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\n', '\xec\x95\x88\xeb\x85\x95\xed\x95\x98\xec\x84\xb8\xec\x9a\x94\n', '\xe4\xbd\xa0\xe5\xa5\xbd\n']

In [5]: a = '你好'

In [6]: a
Out[6]: '\xe4\xbd\xa0\xe5\xa5\xbd'

My Python version is 2.7.11 and OS is Ubuntu 14.04

How to handle these '\xe4\xbd\xa0\xe5\xa5\xbd\n' strings.

Thanks!

like image 521
GoingMyWay Avatar asked Apr 15 '16 06:04

GoingMyWay


3 Answers

First you need to read the text as unicode

import codecs
f = codecs.open('test.txt','r','utf-8')

Second

When you print you should encode it like this

unicodeText.encode('utf-8')

Third

you should insure that your console support unicode display

Use

print sys.getdefaultencoding()

if it doesn't try

reload(sys)
sys.setdefaultencoding('utf-8')
like image 195
sami Avatar answered Oct 19 '22 19:10

sami


What you see is the difference between

  1. Printing a string
  2. Printing a list

Or more generally, the difference between an objects "informal" and "official" string representation (see documentation).

In the first case, the unicode string will be printed correctly, as you would expect, with the unicode characters.

In the second case, the items of the list will be printed using their representation and not their string value.

for line in f.readlines():
    print line

is the first (good) case, and

print f.readlines()

is the second case.

You can check the difference by this example:

 a = u'ð€œłĸªßð'
 print a
 print a.__repr__()
 l = [a, a]
 print l

This shows the difference between the special __str__() and __repr__() methods which you can play with yourself.

class Person(object):
    def __init__(self, name):
        self.name = name
    def __str__(self):
        return self.name
    def __repr__(self):
        return '<Person name={}>'.format(self.name)

p = Person('Donald')
print p  #  Prints 'Donald' using __str__
p # On the command line, prints '<Person name=Donald>' using __repr__

I.e., the value you see when simply typing an object name on the console is defined by __repr__ while what you see when you use print is defined by __str__.

like image 40
Hannes Ovrén Avatar answered Oct 19 '22 21:10

Hannes Ovrén


My python version 2.7.11 and operating system is Mac OSX,I write

こんにちは
안녕하세요
你好

to test.txt. My program is :

# -*-coding:utf-8-*-

import json


if __name__ == '__main__':
    f = open("./test.txt", "r")
    a = f.readlines()
    print json.dumps(a, ensure_ascii=False)
    f.close()

run the program, result:

["こんにちは\n", "안녕하세요\n", "你好"]
like image 2
Karl Doenitz Avatar answered Oct 19 '22 21:10

Karl Doenitz