Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best practice for storing huge amounts of text (into a DB or as a file?), and what about compressing it?

I'm building a web-app that handles internal emails and other frequent small-to-medium sized chunks of text between users and clients. What's the best method for storing this data? In a database (MySQL) or as thousands of individual files? What about compressing it (PHP's gzcompress() or MySQL's compression features)?

This will not be a public application, so the user load will be minimal (less than 20 users at a time). However, there will be a lot of communication going back-and-forth every day within the app, so I expect the amount of data to grow quite large as time goes by (which is why I'd like to compress it).

I'd like to keep the data in a database for ease of access and portability, but some of the threads I've seen on here regarding images have suggested using file storage. What do you think?

Thank you, Seth

Edit for clarification: I do not require any sort of searching of the text, which is why I would lean toward compressing it to save on space.

like image 308
Seth Avatar asked Nov 06 '22 22:11

Seth


2 Answers

For images and documents that are already in a specific format (excel, word documents, pdf files, etc) I prefer file storage. But for just raw text I would probably rather use a database. It is easier to replicate across machines for failover, you can do substring searches over the text and although I don't know of a specific algorithm to use to compress it, I would think that a database would be a better way to go. But only if you already have just the text and it is only text. Any other format of document I would prefer using file storage.

And unless I am missing something I would use a CLOB instead of a BLOB, if it is only text.

like image 191
Ryan Guill Avatar answered Nov 15 '22 05:11

Ryan Guill


One of the main reasons for keeping the files in a database is to keep it consistent with the rest of the data that you are storing. It will be easier to make backups, (re)deploy with predefined datasets etc. Furthermore it's easier to guarantee transactional integrity.

One of the benefits of storing text as files could be that it is easier to serve them using a webserver, if this is the only remaining benefit of using files you could look into caching the files on the webserver -- that will give you much of the easy backup and transactions of the database but at the same time allow some speedup for http requests.

like image 28
Simon Groenewolt Avatar answered Nov 15 '22 04:11

Simon Groenewolt