Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store Documents/Video in database or as separate files?

Is it a better practice to store media files (documents, video, images, and eventually executables) in the database itself, or should I just put a link to them in the database and store them as individual files?

like image 301
Jarred Sumner Avatar asked Feb 19 '11 22:02

Jarred Sumner


People also ask

Should videos be stored in database?

Storing videos and images directly in database is not a viable solution. The ideal way is to store images and videos in CDN and save that file path in DB. Advantage of storing in CDNs, data will be served from edge server's(cached) rather than from source.

Is it better to store files in database or filesystem?

Database provides a proper data recovery process while file system did not. In terms of security the database is more secure then the file system (usually).

Should you store documents in a database?

In my own experience, it is always better to store files as files. The reason is that the filesystem is optimised for file storeage, whereas a database is not.


3 Answers

Read this white paper by MS research (to BLOB or not to BLOB) - it goes in depth about the question.

Executive summary - if you have lots of small (150kb and less) files, you might as well store them in the DB. Of course, this is right for the databases they were testing with and using their test procedures. I suggest reading the article in full to at least gain a good understanding of the trade-offs.

like image 193
Oded Avatar answered Oct 12 '22 13:10

Oded


That is an interesting paper that Oded has linked to - if you are using Sql Server 2008 with its FileStream feature the conclusion is similar. I have quoted a couple of salient points from the linked FileStream whitepaper:

"FILESTREAM storage is not appropriate in all cases. Based on prior research and FILESTREAM feature behavior, BLOB data of size 1 MB and larger that will not be accessed through Transact-SQL is best suited to storing as FILESTREAM data."

"Consideration must also be given to the update workload, as any partial update to a FILESTREAM file will generate a complete copy of the file. With a particularly heavy update workload, the performance may be such that FILESTREAM is not appropriate"

like image 1
Andrew Avatar answered Oct 12 '22 12:10

Andrew


Two requirements drive the answer to your question:

  1. Is there more than one application server reading binaries from the database server?
  2. Do you have a database connection that can stream binaries for write and read?

Multiple application servers pulling binaries from one database server really hinders your ability to scale. Consider that database connections are usually - necessarily - coming from a smaller pool than the application servers' request servicing pool. And, the data volume binaries will consume being sent from database server to application server over the pipe. The database server will likely queue requests because its pool of connections will be consumed delivering binaries.

Streaming is important so that a file is not completely in server memory on read or write (looks like @Andrew's answer about SQL Server 2008 FILESTREAM may speak to this). Imagine a file several gigabytes in size - if read completely into memory - would be enough to crash many application servers, which just don't have the physical memory to accommodate. If you don't have streaming database connections storing in the database is really not viable, unless you constrain file size such that your application server software is allocated at least as much memory as the max file size * number of request servicing connections + some additional overhead.

Now let's say you don't put the files in the database. Most operating systems are very good at caching frequently accessed files. So right off the bat you get an added benefit. Plus, if you're doing web servers, they are pretty good at sending back the right request headers, such as mime type, content length, e-tags, etc... which you otherwise end up coding yourself. The real issues are replication between servers, but most application servers are pretty good at doing this via http - streaming the read and write, and as another answerer pointed out keeping database and file system in sync for backups.

like image 1
orangepips Avatar answered Oct 12 '22 13:10

orangepips