Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Safe" markdown processor for PHP?

Is there a PHP implementation of markdown suitable for using in public comments?

Basically it should only allow a subset of the markdown syntax (bold, italic, links, block-quotes, code-blocks and lists), and strip out all inline HTML (or possibly escape it?)

I guess one option is to use the normal markdown parser, and run the output through an HTML sanitiser, but is there a better way of doing this..?

We're using PHP markdown Extra for the rest of the site, so we'd already have to use a secondary parser (the non-"Extra" version, since things like footnote support is unnecessary).. It also seems nicer parsing only the *bold* text and having everything escaped to &lt;a href="etc"&gt;, than generating <b>bold</b> text and trying to strip the bits we don't want..

Also, on a related note, we're using the WMD control for the "main" site, but for comments, what other options are there? WMD's javascript preview is nice, but it would need the same "neutering" as the PHP markdown processor (it can't display images and so on, otherwise someone will submit and their working markdown will "break")

Currently my plan is to use the PHP-markdown -> HTML santiser method, and edit WMD to remove the image/heading syntax from showdown.js - but it seems like this has been done countless times before..

Basically:

  • Is there a "safe" markdown implementation in PHP?
  • Is there a HTML/javascript markdown editor which could have the same options easily disabled?

Update: I ended up simply running the markdown() output through HTML Purifier.

This way the Markdown rendering was separate from output sanitisation, which is much simpler (two mostly-unmodified code bases) more secure (you're not trying to do both rendering and sanitisation at once), and more flexible (you can have multiple sanitisation levels, say a more lax configuration for trusted content, and a much more stringent version for public comments)

like image 424
dbr Avatar asked May 19 '09 23:05

dbr


1 Answers

PHP Markdown has a sanitizer option, but it doesn't appear to be advertised anywhere. Take a look at the top of the Markdown_Parser class in markdown.php (starts on line 191 in version 1.0.1m). We're interested in lines 209-211:

# Change to `true` to disallow markup or entities. var $no_markup = false; var $no_entities = false; 

If you change those to true, markup and entities, respectively, should be escaped rather than inserted verbatim. There doesn't appear to be any built-in way to change those (e.g., via the constructor), but you can always add one:

function do_markdown($text, $safe=false) {     $parser = new Markdown_Parser;     if ($safe) {         $parser->no_markup = true;         $parser->no_entities = true;     }     return $parser->transform($text); } 

Note that the above function creates a new parser on every run rather than caching it like the provided Markdown function (lines 43-56) does, so it might be a bit on the slow side.

like image 99
Noah Medling Avatar answered Oct 02 '22 15:10

Noah Medling