Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escape special HTML characters in Python

Tags:

I have a string where special characters like ' or " or & (...) can appear. In the string:

string = """ Hello "XYZ" this 'is' a test & so on """ 

how can I automatically escape every special character, so that I get this:

string = " Hello "XYZ" this 'is' a test & so on " 
like image 836
creativz Avatar asked Jan 16 '10 12:01

creativz


People also ask

Can I escape HTML special characters?

You must keep in mind that inside the <> is also html. In that case skipping > will break. If you're only escaping for between tags then you probably only need escape < and &.

What is escape () in Python?

To insert characters that are illegal in a string, use an escape character. An escape character is a backslash \ followed by the character you want to insert.

How do you use special characters in Python?

In Python strings, the backslash "\" is a special character, also called the "escape" character. It is used in representing certain whitespace characters: "\t" is a tab, "\n" is a newline, and "\r" is a carriage return. Conversely, prefixing a special character with "\" turns it into an ordinary character.


1 Answers

In Python 3.2, you could use the html.escape function, e.g.

>>> string = """ Hello "XYZ" this 'is' a test & so on """ >>> import html >>> html.escape(string) ' Hello &quot;XYZ&quot; this &#x27;is&#x27; a test &amp; so on ' 

For earlier versions of Python, check http://wiki.python.org/moin/EscapingHtml:

The cgi module that comes with Python has an escape() function:

import cgi  s = cgi.escape( """& < >""" )   # s = "&amp; &lt; &gt;" 

However, it doesn't escape characters beyond &, <, and >. If it is used as cgi.escape(string_to_escape, quote=True), it also escapes ".


Here's a small snippet that will let you escape quotes and apostrophes as well:

 html_escape_table = {      "&": "&amp;",      '"': "&quot;",      "'": "&apos;",      ">": "&gt;",      "<": "&lt;",      }   def html_escape(text):      """Produce entities within text."""      return "".join(html_escape_table.get(c,c) for c in text) 

You can also use escape() from xml.sax.saxutils to escape html. This function should execute faster. The unescape() function of the same module can be passed the same arguments to decode a string.

from xml.sax.saxutils import escape, unescape # escape() and unescape() takes care of &, < and >. html_escape_table = {     '"': "&quot;",     "'": "&apos;" } html_unescape_table = {v:k for k, v in html_escape_table.items()}  def html_escape(text):     return escape(text, html_escape_table)  def html_unescape(text):     return unescape(text, html_unescape_table) 
like image 58
kennytm Avatar answered Sep 17 '22 15:09

kennytm