Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the reasons not to allow HTML tables when validating user input fields?

Tags:

markdown

xss

I'm writing a little bit of a wiki and going through all of my options for syntax highlighting. Debating between wiki syntax (mediawiki) and markdown + whitelisted tags. I think I would prefer the latter but I think my users will need tables. Why are tables disallowed here on Stackoverflow?

<table> <tr> <td> </td> </tr> </table>
like image 479
Shawn Avatar asked Jan 23 '09 03:01

Shawn


3 Answers

They serve no purpose in a Q&A format. At least I cannot think of a reason I would need to use a table to answer someone's question, or ask one myself.

Plus, you can do this anyway:

cell 1-1      cell 1-2
cell 2-1      cell 2-2

EDIT: So after reading comments on my reply, I see that there may be a few cases where a table could provide a better visual aid. So I'm going to recommend a markdown similar to CSV; I think that's easy enough to type and implement.

like image 78
geowa4 Avatar answered Oct 08 '22 19:10

geowa4


Disallowing tables would be a good idea if your site is built on top of tables and you can't write a regex that is good enough to validate that the users html is syntactically correct, otherwise your layout could be affected.

Even if your site is not table layed out, having two sets of malformed table html in comment posts etc. could lead to your site being defaced.

like image 30
cjk Avatar answered Oct 08 '22 18:10

cjk


Three reasons:

  • compatibility with arbitrary Markdown implementations,
  • safe user input,
  • layout-independent content

Standard Markdown does not support tables. It is intended to be just like e-mail. SO uses standard Markdown, so no tables.

Some Markdown extensions support tables, but they are not compatible between each other, which invalidates the idea of Markdown, because the content becomes dependent on a particular Markdown implementation.

So, the tables can be made only with HTML-inside-Markdown. Which is also not good. I am sure that Markdown2PDF, Markdown2TeX and Markdown2TheNextBigML converters are easy to write. Converting Markdown with embedded HTML to anything but HTML is not trivial. So there is no point to store everything in Markdown (plain text), if (some) embedded HTML is allowed.

Another reason to sanitize all user-submitted HTML is obvious, it is too difficult and expensive to parse properly, and it can break the layout (e.g. <table width="10000" height="10000">).

Finally, there is a huge benefit in a lightweight (pure Markdown) markup: it does not depend on a particular site layout (screen width, paddings, margins, justification, column widths, etc.). So if a SO redesign happens a year from now, the content does not need to be edited (HTML snippets depend on a particular CSS implicitly). Additional bonus: easier to use in third party applications (like mobile phone clients).

like image 2
sastanin Avatar answered Oct 08 '22 18:10

sastanin