Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database blobs vs Disk stored files

So I have this requirement that says the app must let users upload and download about 6000 files per month (mostly pdf, doc, xls).

I was thinking about the optimal solution for this. Question is whether I'll use BLOb's in my database or a simple file hierarchy for writing/reading these bunch of files.

The app architecture is based on Java 1.6, Spring 3.1 and DOJO, Informix 10.X.

So I'm here just to be advised based on your experience.

like image 230
Alex Pi Avatar asked Jul 11 '12 17:07

Alex Pi


2 Answers

When asking what's the "best" solution, it's a good idea to include your evaluation criteria - speed, cost, simplicity, maintenance etc.

The answer Mikko Maunu gave is pretty much on the money. I haven't used Informix in 20 years, but most databases are a little slow when dealing with BLOBs - especially the step of getting the BLOB into and out of the database can be slow.

That problem tends to get worse as more users access the system simultaneously, especially if they use a web application - the application server has to work quite hard to get the files in and out of the database, probably consumes far more memory for those requests than normal, and probably takes longer to complete the file-related requests than for "normal" pages.

This can lead to the webserver slowing down under only moderate load. If you choose to store the documents in your database, I'd strongly recommend running some performance tests to see if you have a problem - this kind of solution tends to expose flaws in your setup that wouldn't otherwise come to light (slow network connection to your database server, insufficient RAM in your web servers, etc.)

To avoid this, I've stored the "master" copies of the documents in the database, so they all get backed up together, and I can ask the database questions like "do I have all the documents for user x?". However, I've used a cache on the webserver to avoid reading documents from the database more than I needed to. This works well if you have a "write once, read many" time solution like a content management system, where the cache can earn its keep.

like image 73
Neville Kuyt Avatar answered Sep 19 '22 05:09

Neville Kuyt


If you have other data in database in relation to these files, storing files to file system makes it more complex:

  1. Back-up should be done separately.
  2. Transactions have to be separately implemented (as far as even possible for file system operations).
  3. Integrity checks between database and file system structure do not come out of the box.
  4. No cascades: removing users pictures as consequence of removing user.
  5. First you have to query for path of file from database and then pick one from file system.

What is good with file system based solution is that sometimes it is handy to be able to directly access files, for example copying part of the images somewhere else. Also storing binary data of course can dramatically change size of database. But in any case, more disk storage is needed somewhere with both solutions.

Of course all of this can ask more DB resources than currently available. There can be in general significant performance hit, especially if decision is between local file system and remote DB. In your case (6000 files monthly) raw performance will not be problem, but latency can be.

like image 25
Mikko Maunu Avatar answered Sep 22 '22 05:09

Mikko Maunu