I have a dict that's feed with url response. Like: <pre class="prettyprint"><code>>>> d { 0: {'data': u'found "\u62c9\u67cf \u591a\u516c \u56ed"'} 1: {'data': u'some other data'} ... } </code></pre> While using <code>xml.etree.ElementTree</code> function on this data values (<code>d[0]['data']</code>) I get the most famous error message: <code>UnicodeEncodeError: 'ascii' codec can't encode characters...</code> What should I do to this Unicode string to make it suitable for ElementTree parser? PS. Please don't send me links with Unicode & Python explanation. I read it all already unfortunately, and can't make use of it, as hopefully others can.

You'll have to encode it manually, to UTF-8: <pre class="prettyprint"><code>ElementTree.fromstring(d[0]['data'].encode('utf-8')) </code></pre> as the API only takes encoded bytes as input. UTF-8 is a good default for such data. It'll be able to decode to unicode again from there: <pre class="prettyprint"><code>>>> from xml.etree import ElementTree >>> p = ElementTree.fromstring(u'found "\u62c9\u67cf \u591a\u516c \u56ed"'.encode('utf8')) >>> p.text u'found "\u62c9\u67cf \u591a\u516c \u56ed"' >>> print p.text found "拉柏多公园" </code></pre>

UnicodeEncodeError: 'ascii' codec can't encode characters

Tags:

python

unicode

elementtree

I have a dict that's feed with url response. Like:

>>> d
{
0: {'data': u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'}
1: {'data': u'<p>some other data</p>'}
...
}

While using xml.etree.ElementTree function on this data values (d[0]['data']) I get the most famous error message:

UnicodeEncodeError: 'ascii' codec can't encode characters...

What should I do to this Unicode string to make it suitable for ElementTree parser?

PS. Please don't send me links with Unicode & Python explanation. I read it all already unfortunately, and can't make use of it, as hopefully others can.

475

asked Nov 21 '12 12:11

theta

1 Answers

You'll have to encode it manually, to UTF-8:

ElementTree.fromstring(d[0]['data'].encode('utf-8'))

as the API only takes encoded bytes as input. UTF-8 is a good default for such data.

It'll be able to decode to unicode again from there:

>>> from xml.etree import ElementTree
>>> p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
found "拉柏 多公 园"

108

answered Sep 19 '22 13:09

Martijn Pieters

Related questions
                            
                                Using Amazon S3 with Heroku, Python, and Flask
                            
                                How to combine callLater and addCallback?
                            
                                Mac OS X, pip: specify compiler for packages containing C libraries
                            
                                os.walk() in reverse?
                            
                                How to get all child components of QWidget in pyside/pyqt/qt?
                            
                                Django and models with multiple foreign keys
                            
                                Python - create dictionary from list of dictionaries
                            
                                Remove element from tuple in a list
                            
                                How do I check if a user left the 'input' or 'raw_input' prompt empty?
                            
                                Python asks for older paths on mac after deleting duplicate python installation
                            
                                How to use can_add_related in Django Admin
                            
                                Change icon for a cx_Freeze script
                            
                                Difference between calling sys.exit() and throwing exception
                            
                                Run multiple scrapy spiders at once using scrapyd
                            
                                running c++ code from python
                            
                                Matplotlib LaTeX: Inconsistent Behaviour with Greek Letters (Specifically \rho)
                            
                                Python - merge time and date [duplicate]
                            
                                After creating python exe file with cx_freeze the file doesn't do anything
                            
                                Using sample_weight in GridSearchCV
                            
                                Python 3: Demystifying encode and decode methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With