Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the standard way to store formatted content in the database?

I have an application that involves storing and retrieving lots of user-formatted content using a WYSIWYG html editor. Kind of like how SO saves formatted questions and answers.

What's the standard approach to do this?

EDIT:

Just to clarify: I am not asking about the data type to store in the DB. Rather I am concerned about storing chunks of html tags with style information in the DB.

like image 853
Rat Salad Avatar asked Mar 06 '14 16:03

Rat Salad


1 Answers

This is just text data. Usually a VARCHAR is best.

UPDATE: Yes, if you want to support Unicode (which you probably do in this case) then make that an NVARCHAR.

As for the OPs update, you are imagining difficulties which don't really exist. HTML is textual data so it goes into a text field. You do not want to separate the formatting from the text at all.

That is the answer but it isn't the end of your concerns on this matter. The reason doing this is bothering you is probably because databases use structured data (all of the data is in named and typed columns) and this is unstructured content. Meaning that the data in this field is not being stored in a DB friendly manner. You should try to structure your data as much as possible because it allows you to quickly search by the field values. We are throwing anything the user types into that field and if we ever need to find data in that field we'll need to search the entire field to find it. This is very slow process and to make things worse we aren't just searching through the text but also the formatting for that text.

This is all true and not good so we should avoid doing this as much as possible. If you can avoid allowing users to enter free form text then do so by all means. From that point you can apply HTML formatting to the data from your client application in a fast and consistent manner.

However, the basis of this question is that you want a field of unstructured content and you are asking how to store that unstructured content. That answer is pretty simple (even though I guess that I didn't get it 100% correct the first try), use NVARCHAR.

Even though storing this unstructured content is not DB friendly it is sometimes website friendly and a common practice in the situation you are describing. The thing to remember is that we want to avoid searching on this unstructured data. We may need to go to fairly extreme measures to do so.

Many applications will solve this slow search problem by creating a separate table and parsing the text out of the HTML and inserting each individual word (along with the foreign key for the original tables entry) into that other table to be searched on later. Even if you do this you'll still want to keep your original formatted text for display purposes.

I generally make this type of optimization Phase II because the site will function without such optimizations; it'll just be slower and that isn't going to even be noticed until the site has plenty of content to search through.

One other thing to note is that often this will not be HTML formatted text. There are several formats commonly used such as BBCode or Markdown. SQL doesn't care though, to your SQL server this is all just text.

like image 101
krowe Avatar answered Oct 15 '22 14:10

krowe