Python Unicode Encode Error ordinal not in range with Euro Sign

Question

I have to read an XML file in Python and grab various things, and I ran into a frustrating error with Unicode Encode Error that I couldn't figure out even with googling.

Here are snippets of my code:

#!/usr/bin/python
# coding: utf-8
from xml.dom.minidom import parseString
with open('data.txt','w') as fout:
   #do a lot of stuff
   nameObj = data.getElementsByTagName('name')[0]
   name = nameObj.childNodes[0].nodeValue
   #... do more stuff
   fout.write(','.join((name,bunch of other stuff))

This spectacularly crashes when a name entry I am parsing contains a Euro sign. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 60: ordinal not in range(128)

I understand why Euro sign will screw it up (because it's at 128, right?), but I thought doing # coding: utf-8 would fix that. I also tried adding .encode(utf-8) so that the name looks instead like

name = nameObj.childNodes[0].nodeValue.encode(utf-8)

But that doesn't work either. What am I doing wrong? (I am using Python 2.7.3 if anyone wants to know)

EDIT: Python crashes out on the fout.write() line -- it will go through fine where the name field is like:

<name>United States, USD</name>

But will crap out on name fields like:

<name>France, € </name>

Fernando Freitas Alves · Accepted Answer

when you are opening a file in python using the open built-in function you will always read the file in ascii. To access it in another encoding you have to use codecs:

import codecs
fout = codecs.open('data.txt','w','utf-8')

Blckknght · Answer

It looks like you're getting Unicode data from your XML parser, but you're not encoding it before writing it out. You can explicitly encode the result before writing it out to the file:

text = ",".join(stuff) # this will be unicode if any value in stuff is unicode
encoded = text.encode("utf-8") # or use whatever encoding you prefer
fout.write(encoded)

Python Unicode Encode Error ordinal not in range<128> with Euro Sign

Tags:

python

unicode

ascii

python-2.7

Joe

2 Answers

Fernando Freitas Alves

Blckknght

Recent Activity

Donate For Us