Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Considerations where to store documents - on file server or in DB?

I have a design decision to make regarding documents uploaded to my web site: I can either store them on my file server somewhere, or I can store them as a blob in my database (MSSQL 2005). If it makes any difference to the design decision, these documents are confidential and must have a certain degree of protection.

The considerations I've thought of are:

  1. Storing on the file server makes for HUUUUUUUGE numbers of files all dumped in a single directory, and therefore slower access, unless I can work out a reasonable semantic definition for a directory tree structure
  2. OTOH, I'm guessing that the file server can handle compression somewhat better than the DB... or am I wrong?
  3. My instincts tell me that the DB's security is stronger than the file server's, but I'm not sure if that's necessarily true.
  4. Don't know how having terabytes of blobs in my DB will affect performance.

I'd very much appreciate some recommendations here. Thanks!

like image 725
Shaul Behr Avatar asked Dec 22 '22 05:12

Shaul Behr


2 Answers

In SQL Server 2005, you only have the choice of using VARBINARY(MAX) to store the files inside the database table, or then keep them outside.

The obvious drawback of leaving them outside the database is that the database can't really control what happens to them; they could be moved, renamed, deleted.....

SQL Server 2008 introduces the FILESTERAM attribute on VARBINARY(MAX) types, which allows you to leave the files outside the database table, but still under transactional control of the database - e.g. you cannot just delete the files from the disk, the files are integral part of the database and thus get copied and backed up with it. Great if you need it, but it could make for some huge backups! :-)

The SQL Server 2008 launch presented some "best practices" as to when to store stuff in the database directly, and when to use FILESTREAM. These are:

  • if the files are typically less than 256 KB in size, the database table is the best option
  • if the files are typically over 1 MB in size, or could be more than 2 GB in size, then FILESTREAM (or in your case: plain old filesystem) is your best choice
  • no recommendation for files between those two margins

Also, in order not to negatively impact performance of your queries, it's often a good idea to put the large files into a separate table alltogether - don't have the huge blobs be part of your regular tables which you query - but rather create a separate table, which you only ever query against, if you really need the megabytes of documents or images.

So that might give you an idea of where to start out from!

like image 96
marc_s Avatar answered Jan 01 '23 08:01

marc_s


I strongly suggest you to consider the filesystem solution. The reasons are:

  • you have better access to the files (precious in case of debugging), meaning that you can use regular console-based tools
  • you can quickly and easily take advantage of the OS to distribute the load, for example using a distributed filesystem, add redundancy via a hardware RAID etc.
  • you can take advantage of the OS access control lists to enforce permissions.
  • you don't clog your database

If you are worried about large amounts of entries in your directories, you can always create a branching schema. for example:

filename : hello.txt
filename md5: 2e54144ba487ae25d03a3caba233da71
final filesystem position: /path/2e/54/hello.txt
like image 35
Stefano Borini Avatar answered Jan 01 '23 08:01

Stefano Borini