Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as <ol> <li>convert markdown to html</li> <li>sanitize html (w/whitelist)</li> <li>insert into database</li> </ol> but to me it makes more sense to do the following: <ol> <li>sanitize markdown (remove all tags - no exceptions)</li> <li>convert to html</li> <li>insert into database</li> </ol> Am I missing something? This seems to me to be pretty nearly xss-proof

Please see this link: http://michelf.com/weblog/2010/markdown-and-xss/ <pre class="prettyprint"><code>> hello <a name="n" > href="javascript:alert('xss')">*you*</a> </code></pre> Becomes <pre class="prettyprint"><code><blockquote> hello <a name="n" href="javascript:alert('xss')">you</a> </blockquote> </code></pre> &there4; you must sanitize after converting to HTML.

There are two issues with what you've proposed: <ol> <li>I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.</li> <li> Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.</li> </ol> There are some good resources on the web about output sanitization: <ul> <li>Sanitizing user data: Where and how to do it</li> <li> Output sanitization (One of my clients, who shall remain nameless and whose affected system was not developed by me, was hit with this exact worm. We have since secured those systems, of course.)</li> <li>BizTech: Best Practices: Never heard of XSS?</li> </ul>

<ol> <li>insert into database</li> <li>convert markdown to html</li> <li>sanitize html (w/whitelist)</li> </ol> <h3>perl</h3> <pre class="prettyprint"><code>use Text::Markdown (); use HTML::StripScripts::Parser (); my $hss = HTML::StripScripts::Parser->new( { Context => 'Document', AllowSrc => 0, AllowHref => 1, AllowRelURL => 1, AllowMailto => 1, EscapeFiltered => 1, }, strict_comment => 1, strict_names => 1, ); $hss->filter_html(Text::Markdown::markdown(shift)) </code></pre>

Markdown and XSS

4 Answers

Please see this link:

http://michelf.com/weblog/2010/markdown-and-xss/

> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>

Becomes

<blockquote>
 <p>hello <a name="n"
 href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>

∴ you must sanitize after converting to HTML.

132

answered Jan 02 '23 20:01

Jordan Reiter

There are two issues with what you've proposed:

I don't see a way for your users to be able to format posts. You took advantage of Markdown to provide nice numbered lists, for example. In the proposed no-tags-no-exceptions world, I'm not seeing how the end user would be able to do such a thing.
Considerably more important: When using Markdown as the "native" formatting language, and whitelisting the other available tags,you are limiting not just the input side of the world, but the output as well. In other words, if your display engine expects Markdown and only allows whitelisted content out, even if (God forbid) somebody gets to the database and injects some nasty malware-laden code into a bunch of posts, the actual site and its users are protected because you are sanitizing it upon display, as well.

There are some good resources on the web about output sanitization:

Sanitizing user data: Where and how to do it
Output sanitization (One of my clients, who shall remain nameless and whose affected system was not developed by me, was hit with this exact worm. We have since secured those systems, of course.)
BizTech: Best Practices: Never heard of XSS?

answered Jan 02 '23 19:01

John Rudy

Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.

*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.

Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.

answered Jan 02 '23 21:01

bobince

insert into database
convert markdown to html
sanitize html (w/whitelist)

perl

use Text::Markdown ();
use HTML::StripScripts::Parser ();

my $hss = HTML::StripScripts::Parser->new(
   {
       Context         => 'Document',
       AllowSrc        => 0,
       AllowHref       => 1,
       AllowRelURL     => 1,
       AllowMailto     => 1,
       EscapeFiltered  => 1,
   },
   strict_comment => 1,
   strict_names   => 1,
);

$hss->filter_html(Text::Markdown::markdown(shift))

answered Jan 02 '23 19:01

Shinichiro Aska

Related questions
                            
                                Sanitize HTML before storing in the DB or before rendering? (AntiXSS library in ASP.NET)
                            
                                var_dump or print_r and html encoding
                            
                                what does "JavaScript sanitization doesn't save you from innerHTML" mean?
                            
                                How to allow specific characters with OWASP HTML Sanitizer?
                            
                                How to make Beautiful Soup output HTML entities?
                            
                                Cross Site Scripting (XSS): Do I need to escape the ampersand?
                            
                                Is the Rails default CSRF protection insecure?
                            
                                JSFiddle error: Please use POST request - after NoScript's XSS warning
                            
                                Does https secure cookies prevent XSS attacks?
                            
                                Attempted exploit?
                            
                                Anybody know a solid library/function in Javascript to clean user input
                            
                                Meteor.js and CSRF/XSS Attacks
                            
                                Execute JavaScript for XSS without script tags
                            
                                Is there anyway to clean HTML code via php being saved to a database? [duplicate]
                            
                                Html escaping in a Rails 3 view
                            
                                Javascript XSS Prevention
                            
                                How do Django forms sanitize text input to prevent SQL injection, XSS, etc?
                            
                                Could anyone explain these XSS test strings?
                            
                                How can I allow my user to insert HTML code, without risks? (not only technical risks)
                            
                                html() vs innerHTML jquery/javascript & XSS attacks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Markdown and XSS

Tags:

markdown

xss

sanitization

psb

People also ask