I fetched a web page, which contains Japanese, but when I print it to the console I didn't get the output as 7月10日
. Instead, it prints: 7\xe6\x9c\x8810\xe6\x97\xa5
What should I do?
The output you get is correct. That is the UTF-8 representation of the japanese string. The problem is the console itself that it doesn't understand UTF-8. If you write that string in a file and open it with an editor that does understand UTF-8 you'll see the content as you would expect. You could also try to change the console's encoding to UTF-8.
Edit: You could also try something along:
print '7\xe6\x9c\x8810\xe6\x97\xa5'.decode('utf-8')
But whether this works depends on the whether the console encoding supports japanese characters. If for example the console's encoding is 'ISO Latin-1' than it won't work...
I suggest you read: http://www.joelonsoftware.com/articles/Unicode.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With