Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitising user input using Python

Tags:

python

xss

What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?

like image 979
Steve Avatar asked Aug 19 '08 20:08

Steve


People also ask

What is sanitization in Python?

sanitize is a Python module for making sure various things (e.g. HTML) are safe to use. It was originally written by Mark Pilgrim and is distributed under the BSD license.

Should you sanitize user input?

User input should always be treated as malicious before making it down into lower layers of your application. Always handle sanitizing input as soon as possible and should not for any reason be stored in your database before checking for malicious intent.

What does sanitize user input mean?

Input sanitization is a cybersecurity measure of checking, cleaning, and filtering data inputs from users, APIs, and web services of any unwanted characters and strings to prevent the injection of harmful codes into the system.

Does Django sanitize inputs?

Django HTML Sanitizer provides a set of utilities to easily sanitize/escape/clean HTML inputs in django. This app is built on top of bleach, the excellent Python HTML sanitizer.


1 Answers

Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).

It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja&#x09;vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)

As you can see, it uses the (awesome) BeautifulSoup library.

import re from urlparse import urljoin from BeautifulSoup import BeautifulSoup, Comment  def sanitizeHtml(value, base_url=None):     rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))     rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))     re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)     validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()     validAttrs = 'href src width height'.split()     urlAttrs = 'href src'.split() # Attributes which should have a URL     soup = BeautifulSoup(value)     for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):         # Get rid of comments         comment.extract()     for tag in soup.findAll(True):         if tag.name not in validTags:             tag.hidden = True         attrs = tag.attrs         tag.attrs = []         for attr, val in attrs:             if attr in validAttrs:                 val = re_scripts.sub('', val) # Remove scripts (vbs & js)                 if attr in urlAttrs:                     val = urljoin(base_url, val) # Calculate the absolute url                 tag.attrs.append((attr, val))      return soup.renderContents().decode('utf8') 

As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.

like image 95
tghw Avatar answered Sep 23 '22 00:09

tghw