Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recommended location for document storage - in database or elsewhere?

Background:

We have an in house document storage system that was implemented long ago. For whatever reason, using the database as the storage mechanism for the documents was chosen.

My question is this:

What is the best practice for storing documents? What are the alternatives? What are the pros and cons? Answers do not have to be technology or platform specific, it is more of a general best practice question.

My Thoughts:

Databases are not meant for document storage. File Systems or 3rd party Document Management systems may be of better use. Document Storage in Databases is expensive. Operations are slow. Are these logic assumptions? Perhaps this is best, but in my mind, we have better alternatives. Could oracle BFILE's (links to document on NAS or SAN) be better than BLOB / CLOB?

Details:

  • Documents are various types (pdf, word, xml)
  • The Middle Tier code is written in .net 2.0 / c#
  • Documents are stored in a Oracle 10g database in BLOB with compression (NAS Storage)
  • File sizes rage
  • The number of document is growing drastically and has no signs of slowing down
  • Inserts is typically is in the hunderds per hour during peak
  • Retreival is typically in the thousands per hour during peak
  • NAS storage and SAN storage is available

UPDATE (from questions below):

  • my background is development
  • there is associated meta-data about the files stored next to file in the database
like image 669
Mike Ohlsen Avatar asked Feb 04 '09 17:02

Mike Ohlsen


People also ask

Where is the best place to store documents?

Your best bet with storing important documents is a safe deposit box. Most banks or credit unions offer safe deposit boxes. Some banks will offer a discount if you're a current customer as well.

Where should you store database data?

Database storage structure All the information in a database is organized and structured in database tables. These tables are stored on the hard disk of the database server.

Should I store files in the database or file system?

Since the dawn of time, database vendors have called out to developers, “Store everything inside the database. You know you want to. Here, we'll make it easy for you by adding data types like binary and features like filestream.”


2 Answers

Based on my experience I'd say keep them in the database. We've moved two of our systems to doing this.

Putting it in the database means:

  • It's easy to access, even from multiple servers
  • It's backed up automatically (instead of having to have a separate job to do that)
  • You don't have to worry about space (since people keep the DB from overfilling the disk, but may forget to monitor where the documents are stored)
  • You don't have to have a complicated directory scheme

We had documents out of the database. It becomes a problem with lots of documents. A normal directory in Linux is one block, which is usually 4K. We had a directory that was 58MB because it had so many files in it (it was just a flat directory, no hierarchy). It had that many indirect blocks. It took over an hour to delete. It took minutes to get a count of the number of files in the directory. It was abysmal. This is on ext3.

With the filesystem you need:

  • Separate backup mechanism (from the DB backup)
  • To keep things in sync (so the record doesn't exist in the DB without the file being there)
  • A hierarchy for storage (to prevent the problem listed above, so no directory ends up with 10,000s of files)
  • Some way to view them from other servers if you need a cluster (so probably NFS or some such)

It's really a pain. For any non-trivial number of documents, I'd recommend against the file system based on what I've seen.

like image 159
MBCook Avatar answered Oct 21 '22 04:10

MBCook


I prefer to store the document in the file system and then store a link to the file and associated file meta-data in the database.

It has proven more convenient, easier to maintain, and less expensive than the alternative.

like image 33
Galwegian Avatar answered Oct 21 '22 04:10

Galwegian