Is there a standard, preferably Pythonic, way to convert the &#xxxx;
notation to a proper unicode string?
For example,
מפגשי
Should be converted to:
מפגשי
It can be done - quite easily - using string manipulations, but I wonder if there's a standard library for this.
Use HTMLParser.HTMLParser()
:
>>> from HTMLParser import HTMLParser
>>> h = HTMLParser()
>>> s = "מפגשי"
>>> print h.unescape(s)
מפגשי
It's part of the standard library, too.
However, if you're using Python 3, you have to import from html.parser
:
>>> from html.parser import HTMLParser
>>> h = HTMLParser()
>>> s = 'מפגשי'
>>> print(h.unescape(s))
מפגשי
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With