Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert string from xmlcharrefreplace back to utf-8

I've next part of code:

In [8]: st = u"опа"

In [11]: st.encode("ascii", "xmlcharrefreplace")
Out[11]: 'опа'

In [14]: st1 = st.encode("ascii", "xmlcharrefreplace")

In [15]: st1.decode("ascii", "xmlcharrefreplace")
Out[15]: u'опа'

In [16]: st1.decode("utf-8", "xmlcharrefreplace")
Out[16]: u'опа'

Do you have any idea how to convert st1 back to u"опа"?

like image 946
Tural Gurbanov Avatar asked Jun 27 '13 11:06

Tural Gurbanov


People also ask

How do you make a UTF-8 string?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

How do you get the UTF-8 character code in Python?

UTF-8 is a variable-length encoding, so I'll assume you really meant "Unicode code point". Use chr() to convert the character code to a character, decode it, and use ord() to get the code point. In Python 2, chr only supports ASCII, so only numbers in the [0..

What is UTF-8 Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.


1 Answers

Use the html.unescape() function (Python 3.4 and newer):

>>> import html
>>> html.unescape('опа')
'опа'

On older versions (including Python 2), you’d have to use an instance of HTMLParser.HTMLParser():

>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> parser.unescape('опа')
u'\u043e\u043f\u0430'
>>> print parser.unescape('опа')
опа
like image 100
Martijn Pieters Avatar answered Sep 30 '22 05:09

Martijn Pieters