Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

codec can't encode character: character maps to <undefined>

Tags:

python

Im' trying read a docx file in python 2.7 with this code:

import docx
document = docx.Document('sim_dir_administrativo.docx')
    docText = '\n\n'.join([
        paragraph.text.encode('utf-8') for paragraph in document.paragraphs])

And then I'm trying to decode the string inside the file with this code, because I have some special characters (e.g. ã):

print docText.decode("utf-8")

But, I'm getting this error:

    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 494457: character maps to <undefined>

How can I solve this?

like image 930
user3511563 Avatar asked Mar 18 '23 22:03

user3511563


1 Answers

The print function can only print characters that are in your local encoding. You can find out what that is with sys.stdout.encoding. To print with special characters you must first encode to your local encoding.

# -*- coding: utf-8 -*-
import sys

print sys.stdout.encoding
print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
print u"Стоескер".encode(sys.stdout.encoding, errors='replace')

This code snippet was taken from this stackoverflow response.

like image 164
Andrew Johnson Avatar answered Apr 07 '23 09:04

Andrew Johnson