Im' trying read a docx file in python 2.7 with this code:
import docx
document = docx.Document('sim_dir_administrativo.docx')
docText = '\n\n'.join([
paragraph.text.encode('utf-8') for paragraph in document.paragraphs])
And then I'm trying to decode the string inside the file with this code, because I have some special characters (e.g. ã):
print docText.decode("utf-8")
But, I'm getting this error:
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
494457: character maps to <undefined>
How can I solve this?
The print function can only print characters that are in your local encoding. You can find out what that is with sys.stdout.encoding
. To print with special characters you must first encode to your local encoding.
# -*- coding: utf-8 -*-
import sys
print sys.stdout.encoding
print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
print u"Стоескер".encode(sys.stdout.encoding, errors='replace')
This code snippet was taken from this stackoverflow response.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With