What's the easiest way to escape HTML in Python?

People also ask

What is escape () in Python?

To insert characters that are illegal in a string, use an escape character. An escape character is a backslash \ followed by the character you want to insert.

What is HTML escaping in flask?

escape (s) → markup. Convert the characters &, <, >, ', and ” in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. Marks return value as markup string.

cgi.escape is fine. It escapes:

< to <
> to >
& to &

That is enough for all HTML.

EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:

data.encode('ascii', 'xmlcharrefreplace')

Don't forget to decode data to unicode first, using whatever encoding it was encoded.

However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).

Example:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (") so you can use the resulting value in a XML/HTML attribute.

EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.

In Python 3.2 a new html module was introduced, which is used for escaping reserved characters from HTML markup.

It has one function escape():

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

If you wish to escape HTML in a URL:

This is probably NOT what the OP wanted (the question doesn't clearly indicate in which context the escaping is meant to be used), but Python's native library urllib has a method to escape HTML entities that need to be included in a URL safely.

The following is an example:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

Find docs here

There is also the excellent markupsafe package.

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

The markupsafe package is well engineered, and probably the most versatile and Pythonic way to go about escaping, IMHO, because:

the return (Markup) is a class derived from unicode (i.e. isinstance(escape('str'), unicode) == True
it properly handles unicode input
it works in Python (2.6, 2.7, 3.3, and pypy)
it respects custom methods of objects (i.e. objects with a __html__ property) and template overloads (__html_format__).

cgi.escape should be good to escape HTML in the limited sense of escaping the HTML tags and character entities.

But you might have to also consider encoding issues: if the HTML you want to quote has non-ASCII characters in a particular encoding, then you would also have to take care that you represent those sensibly when quoting. Perhaps you could convert them to entities. Otherwise you should ensure that the correct encoding translations are done between the "source" HTML and the page it's embedded in, to avoid corrupting the non-ASCII characters.

No libraries, pure python, safely escapes text into html text:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).replace('\'','&#39;').replace('"','&#34;').encode('ascii', 'xmlcharrefreplace')

`cgi.escape` extended

This version improves cgi.escape. It also preserves whitespace and newlines. Returns a unicode string.

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

for example

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

Related questions
                            
                                How to convert 'false' to 0 and 'true' to 1?
                            
                                Can I redirect the stdout into some sort of string buffer?
                            
                                django-debug-toolbar not showing up
                            
                                Using a string variable as a variable name [duplicate]
                            
                                How to get numbers after decimal point?
                            
                                Why does (1 in [1,0] == True) evaluate to False?
                            
                                What is the difference between the AWS boto and boto3
                            
                                Getting realtime output using subprocess
                            
                                How to convert a set to a list in python?
                            
                                __init__ for unittest.TestCase
                            
                                Numpy: Divide each row by a vector element
                            
                                What does preceding a string literal with "r" mean? [duplicate]
                            
                                Can I use __init__.py to define global variables?
                            
                                How do I call setattr() on the current module?
                            
                                How to duplicate virtualenv
                            
                                What is the difference between json.dumps and json.load? [closed]
                            
                                What do >> and << mean in Python?
                            
                                Add single element to array in numpy
                            
                                Getting attributes of a class
                            
                                VSCode -- how to set working directory for debug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the easiest way to escape HTML in Python?

Tags:

python

html

People also ask

`cgi.escape` extended

for example

Recent Activity

Donate For Us

What's the easiest way to escape HTML in Python?

Tags:

python

html

People also ask

cgi.escape extended

for example

Related questions

Recent Activity

Donate For Us

`cgi.escape` extended