I have never dealt with encoding and decoding strings, so I am quite the newbie on this front. I am receiving a UnicodeEncodeError when I try to write the contents I read from another file to a temporary file using file.write in Python. I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 41333: ordinal not in range(128)
Here is what I am doing in my code. I am reading an XML file and getting the text from the "mydata" tag. I then iterate through mydata to look for CDATA
parser = etree.XMLParser(strip_cdata=False)
root = etree.parse(myfile.xml, parser)
data = root.findall('./mydata')
# iterate through list to find text (lua code) contained in elements containing CDATA
for item in myData:
myCode = item.text
# Write myCode to a temporary file.
tempDirectory = tempfile.mkdtemp(suffix="", prefix="TEST_THIS_")
file = open(tempDirectory + os.path.sep + "myCode.lua", "w")
file.write(myCode + "\n")
file.close()
It fails with the UnicodeEncodeError when I hit the following line:
file.write(myCode + "\n")
How should I properly encode and decode this?
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
UnicodeDecodeError: Error 'utf-8' codec can't decode byte 0x92 in position 4: invalid start byte. Many times, when you load your datasets you may come across some encoding error which is sometimes quite irritating because of the long error messages they raised. This message is quite complex to decode.
Python2.7's open
function does not transparently handle unicode characters like python3 does. There is extensive documentation on this, but if you want to write unicode strings directly without decoding them, you can try this
>>> import codecs
>>> f = codecs.open(filename, 'w', encoding='utf8')
>>> f.write(u'\u201c')
For comparison, this is how the error happen
>>> f = open(filename, 'w')
>>> f.write(u'\u201c')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With