Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error writing a file with file.write in Python. UnicodeEncodeError

I have never dealt with encoding and decoding strings, so I am quite the newbie on this front. I am receiving a UnicodeEncodeError when I try to write the contents I read from another file to a temporary file using file.write in Python. I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 41333: ordinal not in range(128)

Here is what I am doing in my code. I am reading an XML file and getting the text from the "mydata" tag. I then iterate through mydata to look for CDATA

    parser = etree.XMLParser(strip_cdata=False)
    root = etree.parse(myfile.xml, parser)
    data = root.findall('./mydata')
    # iterate through list to find text (lua code) contained in elements containing CDATA
    for item in myData:
        myCode = item.text

    # Write myCode to a temporary file.
    tempDirectory = tempfile.mkdtemp(suffix="", prefix="TEST_THIS_")
    file = open(tempDirectory + os.path.sep + "myCode.lua", "w")

    file.write(myCode + "\n")
    file.close()

It fails with the UnicodeEncodeError when I hit the following line:

file.write(myCode + "\n")

How should I properly encode and decode this?

like image 549
user2643864 Avatar asked Mar 13 '14 22:03

user2643864


People also ask

How to fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is utf 8 encoding error in Python?

UnicodeDecodeError: Error 'utf-8' codec can't decode byte 0x92 in position 4: invalid start byte. Many times, when you load your datasets you may come across some encoding error which is sometimes quite irritating because of the long error messages they raised. This message is quite complex to decode.


1 Answers

Python2.7's open function does not transparently handle unicode characters like python3 does. There is extensive documentation on this, but if you want to write unicode strings directly without decoding them, you can try this

>>> import codecs
>>> f = codecs.open(filename, 'w', encoding='utf8')
>>> f.write(u'\u201c')

For comparison, this is how the error happen

>>> f = open(filename, 'w')
>>> f.write(u'\u201c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)
like image 148
metatoaster Avatar answered Oct 23 '22 19:10

metatoaster