Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python print Unicode string via 'Git Bash' gets 'UnicodeEncodeError'

Tags:

in test.py i have

print('Привет мир')

with cmd worked as normal

> python test.py
?????? ???

with Git Bash got error

$ python test.py
Traceback (most recent call last):
  File "test.py", line 2, in <module>
    print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440')
  File "C:\Users\raksa\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

enter image description here

Does anyone know the reason behind of getting error when execute python code via Git Bash?

like image 287
raksa Avatar asked Aug 13 '17 13:08

raksa


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What causes Unicode error in Python?

The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail.

How do I make Unicode support Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.


2 Answers

Python 3.6 directly uses the Windows API to write Unicode to the console, so is much better about printing non-ASCII characters. But Git Bash isn't the standard Windows console so it falls back to previous behavior, encoding Unicode string in the terminal encoding (in your case, cp1252). cp1252 doesn't support Cyrillic, so it fails. This is "normal". You'll see the same behavior in Python 3.5 and older.

In the Windows console Python 3.6 should print the actual Cyrillic characters, so what is surprising is your "?????? ???". That is not "normal", but perhaps you don't have a font selected that supports Cyrillic. I have a couple of Python versions installed:

C:\>py -3.6 --version
Python 3.6.2

C:\>py -3.6 test.py
Привет мир

C:\>py -3.3 --version
Python 3.3.5

C:\>py -3.3 test.py
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440 \u4f60\u597d')
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>
like image 72
Mark Tolonen Avatar answered Sep 22 '22 14:09

Mark Tolonen


Had this problem with python 3.9

import sys, locale
print("encoding", sys.stdout.encoding)
print("local preferred", locale.getpreferredencoding())
print("fs encoding", sys.getfilesystemencoding())

If this returns "cp1252" and not "utf-8" then print() doesn't work with unicode.

This was fixed by changing the windows system locale.

Region settings > Additional settings > Administrative > Change system locale > Beta: Use Unicode UTF-8 for worldwide language support
like image 34
David Stephan Avatar answered Sep 24 '22 14:09

David Stephan