Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'charmap' codec can't encode character... problems

Before anyone gives me crap about this being asked a billion times, please note that I've tried several of the answers in many a thread but none of them seemed to work properly for my problem.

import json
def parse(fn):
    results = []
    with open(fn) as f:
        json_obj = json.loads(open(fn).read())
        for r in json_obj["result"]:
            print(r["name"])

parse("wine.json")

I'm basically just opening a json file and iterating it for some values. Obviously, whenever I read a value with some unicode in it I get this error.

Traceback (most recent call last):
  File "json_test.py", line 9, in <module>
    parse("wine.json")
  File "json_test.py", line 7, in parse
    print(r["name"])
  File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position
15: character maps to <undefined>

As people said in other threads I've tried to encode it and whatnot, but then I get a similar error, no matter how I encode and/or decode it. Please help.

like image 269
TumbaBit Avatar asked Dec 03 '22 19:12

TumbaBit


2 Answers

Everything is fine up until the point where you try to print the string. To print a string it must first be converted from pure Unicode to the byte sequences supported by your output device. This requires an encode to the proper character set, which Python has identified as cp850 - the Windows Console default.

Starting with Python 3.4 you can set the Windows console to use UTF-8 with the following command issued at the command prompt:

chcp 65001

This should fix your issue, as long as you've configured the window to use a font that contains the character.

Starting with Python 3.6 this is no longer necessary - Windows has always had a full Unicode interface for the console, and Python is now using it in place of the primitive code page I/O. Unicode to the console just works.

like image 122
Mark Ransom Avatar answered Dec 26 '22 14:12

Mark Ransom


What I ended up doing as a possible temporary fix (depending on if anyone have a better answer) was using Unidecode. Unfortunately I lost all the accents but maybe someone has a fix for that.

like image 30
TumbaBit Avatar answered Dec 26 '22 15:12

TumbaBit