Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Python to replace HTML escape characters? [duplicate]

Tags:

python

Possible Duplicate:
Decode HTML entities in Python string?

I have a string full of HTML escape characters such as ", ”, and —.

Do any Python libraries offer reliable ways for me to replace all of these escape characters with their respective actual characters?

For instance, I want all "s replaced with "s.

like image 329
dangerChihuahua007 Avatar asked Jul 10 '12 02:07

dangerChihuahua007


People also ask

How do you escape a special character in a string python?

Escape sequences allow you to include special characters in strings. To do this, simply add a backslash ( \ ) before the character you want to escape.


1 Answers

You want to use this:

try:
    from html.parser import HTMLParser  # Python 3
except ModuleNotFoundError:
    from HTMLParser import HTMLParser  # Python 2
parser = HTMLParser()
html_decoded_string = parser.unescape(html_encoded_string)

I also am seeing a lot of love for BeautifulSoup

from BeautifulSoup import BeautifulSoup
html_decoded_string = BeautifulSoup(html_encoded_string, convertEntities=BeautifulSoup.HTML_ENTITIES)

Also Duplicate of these existing questions:

Decode HTML entities in Python string?

Decoding HTML entities with Python

Decoding HTML Entities With Python

like image 197
Francis Yaconiello Avatar answered Sep 28 '22 10:09

Francis Yaconiello