Possible Duplicate:
Decode HTML entities in Python string?
I have a string full of HTML escape characters such as ", ”, and —.
Do any Python libraries offer reliable ways for me to replace all of these escape characters with their respective actual characters?
For instance, I want all "s replaced with "s.
Escape sequences allow you to include special characters in strings. To do this, simply add a backslash ( \ ) before the character you want to escape.
You want to use this:
try:
    from html.parser import HTMLParser  # Python 3
except ModuleNotFoundError:
    from HTMLParser import HTMLParser  # Python 2
parser = HTMLParser()
html_decoded_string = parser.unescape(html_encoded_string)
I also am seeing a lot of love for BeautifulSoup
from BeautifulSoup import BeautifulSoup
html_decoded_string = BeautifulSoup(html_encoded_string, convertEntities=BeautifulSoup.HTML_ENTITIES)
Also Duplicate of these existing questions:
Decode HTML entities in Python string?
Decoding HTML entities with Python
Decoding HTML Entities With Python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With