Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store articles or other large texts in a database

Tags:

database

xml

I am currently in the process of designing myself a database driven website. The main reason is for learning purposes but I wont lie, there is a small amount of vanity included!

While I believe that my database design is pretty good so far, I am still not entirely sure on the best way of storing articles or other large texts. I know most DBMSs have the TEXT datatype or equivalent and can hold a massive amount of text. However, storing a full article as one long string makes for unhappy reading, so formatting is going to be needed.

Do I store the article text along with all of the HTML or BBcode tags - or is it better to simply create the page in either a HTML or XML document and store the path to this file in the DB?

I quite like the idea of storing articles as an XML document as I can easily markup an article with custom tags and use PHP's XML and XSLT functions to transform the XML to HTML [or indeed, any other format]. It also allows the author to dictate when to create line/page breaks. This approach would of course require extra coding [which I am not afraid of] but it does present a problem with making articles searchable.

I know MySQL, for example, has SQL syntax for searching for specific terms/phrases inside strings held in a text field. If I were to store text in separate files, how might I approach making these articles searchable?

There is quite a lot I have written here on such a simple question, so I will break it down:

1: Is there a "best" way of storing large amounts of formatted text directly in a database or
2: is it better to hold paths to that text in the form of HTML/XML/Whatever documents.

If 2, is there an elegant way of making that text searchable?

Thank you for your time :)

like image 610
Etzeitet Avatar asked Jul 05 '09 17:07

Etzeitet


People also ask

How can we store large string in database?

We can use varchar(<maximum_limit>) . The maximum limit that we can pass is 65535 bytes. Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

How do you store a large amount of data in a database?

Using cloud storage. Cloud storage is an excellent solution, but it requires the data to be easily shared between multiple servers in order to provide scaling. The NoSQL databases were specially created for using, testing and developing local hardware, and then moving the system to the cloud, where it works.

How are database articles stored?

The TEXT, BIGTEXT, LONGTEXT and others data types fields were created in order to store large amount of text (64 Kbytes to 4 Gbytes depending of the RDBMS). They just create a binary pointer to locate the text in the database and it is not stored directly in the table.

Can a database store TEXT?

Most SQL databases have a TEXT field type or the equivalent just for the purpose of storing textual data of any length. Many also have full text searching systems in place. Make technical decisions based on technical knowledge and understanding, not feelings.


1 Answers

Store everthing in one big text field as Alex suggested. For searching, don't hammer your database, use Lucene, or htdig to create an index of your output. This way searches are very fast. The side effect is you make your searches a little more search engine friendly; you take your keywords field (as backslash suggested) and stick them in the meta-keywords attribute.

Edit

Unless you are only searching keywords, having the db do the searches will be horribly slow (ever searched a forum and it takes FOREVER?). There is no way for the database to index a

  select.. where FULLTEXTFIELD like '%cookies%'.   

It is frustrating looking for an article and the search doesn't return the results your are looking for because they weren't in the keyword field! Htdig allows you to search the full text of the article efficiently. Your searches will come back instantly, and EVERY term in the article is fully searchable. Putting the keywords in the meta tags will make searches on those terms come higher on the results page.

Another benefit is fuzzy matching. If you search for 'activate' htdigg will match pages that have active, activation, activity etc. (configurable). Or if the user misspells a word, it will still be matched. You want your users to have a Google like experience, not an annoying one. :)

You do need a script to create a list of links to all your pages from your database. Have htdig crawl this automatically and you never have to think about it again.

Also htdig will crawl your non database pages as well so your whole site is searchable through the same simple interface.

As for the keyword field , you should have a separate table called keywords with the id of the article and a keyword field (1 keyword per row). But for simplicity, having a single field in the db isn't a terrible idea, it makes updating the keywords pretty easy if you put it in a form.

If you don't want to fuss with all the hassle of that, you can try using Google custom search. it is far less work, but you have no guarantee that all your pages will get indexed.

Good luck!

like image 120
Byron Whitlock Avatar answered Sep 20 '22 20:09

Byron Whitlock