Possible Duplicate:
Decode HTML entities in Python string?
I have a string full of HTML escape characters such as "
, ”
, and —
.
Do any Python libraries offer reliable ways for me to replace all of these escape characters with their respective actual characters?
For instance, I want all "
s replaced with "s.
Escape sequences allow you to include special characters in strings. To do this, simply add a backslash ( \ ) before the character you want to escape.
You want to use this:
try:
from html.parser import HTMLParser # Python 3
except ModuleNotFoundError:
from HTMLParser import HTMLParser # Python 2
parser = HTMLParser()
html_decoded_string = parser.unescape(html_encoded_string)
I also am seeing a lot of love for BeautifulSoup
from BeautifulSoup import BeautifulSoup
html_decoded_string = BeautifulSoup(html_encoded_string, convertEntities=BeautifulSoup.HTML_ENTITIES)
Also Duplicate of these existing questions:
Decode HTML entities in Python string?
Decoding HTML entities with Python
Decoding HTML Entities With Python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With