When you are developing a web-based application and you want to allow richly formatted text from the user you have to make a choice about how to allow that input. Many different markup languages have been created because it is arguably more difficult to sanitize HTML.
What are the advantages and disadvantages of the various different markup languages like:
Or to put it differently, what factors do you consider when choosing to use a particular markup language.
The most widely used markup languages are SGML (Standard Generalized Markup Language), HTML (Hypertext Markup Language), and XML (Extensible Markup Language).
Tags are used to identify content.
HTML is the most well-known markup language on the list and it is also the most forgiving. However, it has a limited number of basic elements, so we'll use it as a gentle introduction to XML.
Jeff discussed some pros and cons on codinghorror.com while they were in the initial stages of putting together SO. I thought it was a worthwhile read.
Markdown, BBCode, Textile, MediaWiki markup are all basically the same general concept, so I would really just lump this into two categories: HTML, and plain text markup.
The deal with HTML is the content is already in a "presentable" form for web content. That's great, saves processing time, and it's a readily parse-able language. There are dozens of libraries in pretty much any language to handle HTML content, convert to/from HTML to other formats, etc. The main downside is that because of the loose standards of the early web days, HTML can be incredibly variable and you can't always depend on sane input when accepting HTML from users. As pointed out, tidying or santizing HTML is often very difficult, especially because it fails to follow normal markup rules the way XML does (i.e. improperly closed tags are common).
This category is frequently used for the following reasons:
Bottom line is what is the user input being used for. If you're planning to keep the data around and may need to shuffle formats etc. then it makes sense to use a careful abstract format to store the information. If you need to work with the raw data manually for any reason, then bonus points if that format is easily human-readable. If you're only displaying the content in a web page (or HTML doc for a report etc.) and you have no concerns about converting it or future-proofing it, then it's a reasonable practice to store it in HTML.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With