Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make python 3 print() utf8

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8 TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes print(sys.getdefaultencoding()) print(sys.stdout.encoding) print(TestText) print(TestText.encode("utf8")) print(TestText.encode("cp1252","replace")) print(TestText2) 

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8 cp1257 Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]   b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd' b'Test - ??????..\x9a\x8a??\x9e\x8e' b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd' 

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last):   File "Test.py", line 1, in <module>     print(chr(255));   File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode     return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined> 

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.


How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text):      RAWOut = open(1, 'w', encoding='utf8', closefd=False)      print(*Text, file=RAWOut)      RAWOut.flush()      RAWOut.close()   printRAW("Cool", TestText) 

Output (now it print in UTF-8):

Cool Test - āĀēĒčČ..šŠūŪžŽ 

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

like image 585
davispuh Avatar asked Aug 30 '10 02:08

davispuh


People also ask

What does encoding =' UTF-8 do in Python?

UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes.


1 Answers

Clarification:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X. TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string. 

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys sys.stdout.buffer.write(TestText2) 
like image 176
Mark Tolonen Avatar answered Sep 23 '22 10:09

Mark Tolonen