Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python prints result as '7\xe6\x9c\x8810\xe6\x97\xa5', but I want '7月10日'

I fetched a web page, which contains Japanese, but when I print it to the console I didn't get the output as 7月10日. Instead, it prints: 7\xe6\x9c\x8810\xe6\x97\xa5

What should I do?

like image 379
user1514160 Avatar asked Jul 10 '12 08:07

user1514160


1 Answers

The output you get is correct. That is the UTF-8 representation of the japanese string. The problem is the console itself that it doesn't understand UTF-8. If you write that string in a file and open it with an editor that does understand UTF-8 you'll see the content as you would expect. You could also try to change the console's encoding to UTF-8.

Edit: You could also try something along:

print '7\xe6\x9c\x8810\xe6\x97\xa5'.decode('utf-8')

But whether this works depends on the whether the console encoding supports japanese characters. If for example the console's encoding is 'ISO Latin-1' than it won't work...

I suggest you read: http://www.joelonsoftware.com/articles/Unicode.html

like image 85
Ioan Alexandru Cucu Avatar answered Oct 17 '22 08:10

Ioan Alexandru Cucu