Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform unicode string in python

{u'Status': u'OK', u'City': u'Ciri\xe8', u'TimezoneName': '', u'ZipPostalCode': '', u'CountryCode': u'IT', u'Dstoffset': u'0', u'Ip': u'x.x.x.x', u'Longitude': u'7.6', u'CountryName': u'Italy', u'RegionCode': u'12', u'Latitude': u'45.2333', u'Isdst': '', u'Gmtoffset': u'0', u'RegionName': u'Piemonte'}

This is the output of my object. I would like to access City but It's encoded. How can I read all parameters and decode it

>>> data['City']
u'Ciri\xe8'

>>>data['City'].decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 4: ordinal not in range(128)

I want plaintext not unicode string. Thank you!

like image 241
dani Avatar asked Apr 22 '12 02:04

dani


People also ask

How do you Unicode a string in Python?

Encodings. To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes.

Can we convert Unicode to text?

World's simplest unicode tool. This browser-based utility converts fancy Unicode text back to regular text. All Unicode glyphs that you paste or enter in the text area as the input automatically get converted to simple ASCII characters in the output.

What does Unicode () do in Python?

If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised.


1 Answers

What you want is not clear. If by 'plaintext' you mean remove accentuation, try this:

>>> s = u'Ciri\xe8'
>>> from unicodedata import normalize
>>> normalize('NFKD', s).encode('ASCII', 'ignore')
'Cirie'
like image 107
Pedro Werneck Avatar answered Sep 17 '22 22:09

Pedro Werneck