I have read the HOWTO on Unicode from the official docs and a full, very detailed article as well. Still I don't get it why it throws me this error.
Here is what I attempt: I open an XML file that contains chars out of ASCII range (but inside allowed XML range). I do that with cfg = codecs.open(filename, encoding='utf-8, mode='r')
which runs fine. Looking at the string with repr()
also shows me a unicode string.
Now I go ahead and read that with parseString(cfg.read().encode('utf-8')
. Of course, my XML file starts with this: <?xml version="1.0" encoding="utf-8"?>
. Although I suppose it is not relevant, I also defined utf-8 for my python script, but since I am not writing unicode characters directly in it, this should not apply here. Same for the following line: from __future__ import unicode_literals
which also is right at the beginning.
Next thing I pass the generated Object to my own class where I read tags into variables like this: xmldata.getElementsByTagName(tagName)[0].firstChild.data
and assign it to a variable in my class.
Now what perfectly works are those commands (obj is an instance of the class):
for element in obj:
print element
And this command does work as well:
print obj.__repr__()
I defined __iter__()
to just yield every variable while __repr__()
uses the typical printf stuff: "%s" % self.varname
Both commands print perfectly and can output the unicode character. What does not work is this:
print obj
And now I am stuck because this throws the dreaded
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 47:
So what am I missing? What am I doing wrong? I am looking for a general solution, I always want to handle strings as unicode, just to avoid any possible errors and write a compatible program.
Edit: I also defined this:
def __str__(self):
return self.__repr__()
def __unicode__(self):
return self.__repr__()
From documentation I got that this
I finally solved it. The problem was (I am not sure why) that if you called either __str__()
or __repr__()
directly it would be hapyp to handle it well, but printing it directly (as in: print obj
) does not work (although it should only just call __str__()
itself).
The final help came from this article. I already got to the step where I got it to print to the console (but a wrong letter) when I used utf-8 encoding. Finally solved it to be perfectly correct by defining this:
def __str__(self):
return self.__repr__().encode(stdout.encoding)
Now the only open question that remains is: Why do print obj.__str__()
and print obj
differently with this? It does make no sense to me. And yes, to stress that again: Calling the former or __repr__()
DID work. And still does with the explicit encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With