Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Unicode Encode Error ordinal not in range<128> with Euro Sign

I have to read an XML file in Python and grab various things, and I ran into a frustrating error with Unicode Encode Error that I couldn't figure out even with googling.

Here are snippets of my code:

#!/usr/bin/python
# coding: utf-8
from xml.dom.minidom import parseString
with open('data.txt','w') as fout:
   #do a lot of stuff
   nameObj = data.getElementsByTagName('name')[0]
   name = nameObj.childNodes[0].nodeValue
   #... do more stuff
   fout.write(','.join((name,bunch of other stuff))

This spectacularly crashes when a name entry I am parsing contains a Euro sign. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 60: ordinal not in range(128)

I understand why Euro sign will screw it up (because it's at 128, right?), but I thought doing # coding: utf-8 would fix that. I also tried adding .encode(utf-8) so that the name looks instead like

name = nameObj.childNodes[0].nodeValue.encode(utf-8)

But that doesn't work either. What am I doing wrong? (I am using Python 2.7.3 if anyone wants to know)

EDIT: Python crashes out on the fout.write() line -- it will go through fine where the name field is like:

<name>United States, USD</name>

But will crap out on name fields like:

<name>France, € </name>
like image 860
Joe Avatar asked Mar 06 '13 02:03

Joe


2 Answers

when you are opening a file in python using the open built-in function you will always read the file in ascii. To access it in another encoding you have to use codecs:

import codecs
fout = codecs.open('data.txt','w','utf-8')
like image 141
Fernando Freitas Alves Avatar answered Nov 15 '22 00:11

Fernando Freitas Alves


It looks like you're getting Unicode data from your XML parser, but you're not encoding it before writing it out. You can explicitly encode the result before writing it out to the file:

text = ",".join(stuff) # this will be unicode if any value in stuff is unicode
encoded = text.encode("utf-8") # or use whatever encoding you prefer
fout.write(encoded)
like image 44
Blckknght Avatar answered Nov 14 '22 23:11

Blckknght