Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function [duplicate]

People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is Charmap codec in Python?

The Python "UnicodeEncodeError: 'charmap' codec can't encode characters in position" occurs when we use an incorrect codec to encode a string to bytes. To solve the error, specify the correct encoding when opening the file or encoding the string, e.g. utf-8 . Here is an example of how the error occurs.

Does Python use UTF-8?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.


I see three solutions to this:

  1. Change the output encoding, so it will always output UTF-8. See e.g. Setting the correct encoding when piping stdout in Python, but I could not get these example to work.

  2. Following example code makes the output aware of your target charset.

    # -*- coding: utf-8 -*-
    import sys
    
    print sys.stdout.encoding
    print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
    print u"Стоескер".encode(sys.stdout.encoding, errors='replace')
    

    This example properly replaces any non-printable character in my name with a question mark.

    If you create a custom print function, e.g. called myprint, using that mechanisms to encode output properly you can simply replace print with myprint whereever necessary without making the whole code look ugly.

  3. Reset the output encoding globally at the begin of the software:

    The page http://www.macfreek.nl/memory/Encoding_of_Python_stdout has a good summary what to do to change output encoding. Especially the section "StreamWriter Wrapper around Stdout" is interesting. Essentially it says to change the I/O encoding function like this:

    In Python 2:

    if sys.stdout.encoding != 'cp850':
      sys.stdout = codecs.getwriter('cp850')(sys.stdout, 'strict')
    if sys.stderr.encoding != 'cp850':
      sys.stderr = codecs.getwriter('cp850')(sys.stderr, 'strict')
    

    In Python 3:

    if sys.stdout.encoding != 'cp850':
      sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict')
    if sys.stderr.encoding != 'cp850':
      sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')
    

    If used in CGI outputting HTML you can replace 'strict' by 'xmlcharrefreplace' to get HTML encoded tags for non-printable characters.

    Feel free to modify the approaches, setting different encodings, .... Note that it still wont work to output non-specified data. So any data, input, texts must be correctly convertable into unicode:

    # -*- coding: utf-8 -*-
    import sys
    import codecs
    sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
    print u"Stöcker"                # works
    print "Stöcker".decode("utf-8") # works
    print "Stöcker"                 # fails
    

Based on Dirk Stöcker's answer, here's a neat wrapper function for Python 3's print function. Use it just like you would use print.

As an added bonus, compared to the other answers, this won't print your text as a bytearray ('b"content"'), but as normal strings ('content'), because of the last decode step.

def uprint(*objects, sep=' ', end='\n', file=sys.stdout):
    enc = file.encoding
    if enc == 'UTF-8':
        print(*objects, sep=sep, end=end, file=file)
    else:
        f = lambda obj: str(obj).encode(enc, errors='backslashreplace').decode(enc)
        print(*map(f, objects), sep=sep, end=end, file=file)

uprint('foo')
uprint(u'Antonín Dvořák')
uprint('foo', 'bar', u'Antonín Dvořák')

For debugging purposes, you could use print(repr(data)).

To display text, always print Unicode. Don't hardcode the character encoding of your environment such as Cp850 inside your script. To decode the HTTP response, see A good way to get the charset/encoding of an HTTP response in Python.

To print Unicode to Windows console, you could use win-unicode-console package.


I dug deeper into this and found the best solutions are here.

http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python

In my case I solved "UnicodeEncodeError: 'charmap' codec can't encode character "

original code:

print("Process lines, file_name command_line %s\n"% command_line))

New code:

print("Process lines, file_name command_line %s\n"% command_line.encode('utf-8'))