I have to find a design decision for the following task:
I have a SQL Server database and it contains a table of orders. PDF documents will be uploaded by users through a simple file upload from a web page and assigned to an order. There is not more than one document per order (perhaps no document, never more than one). For this purpose a user opens a web page, enters an order number, gets the order displayed and clicks on an upload button. So I know to which order the uploaded document belongs to.
Now I am considering two options to store the documents on the web server:
1) Extend my table of orders by a varbinary(MAX) column and store the PDF document directly into that binary field.
2) Save the PDF file in a specific folder on disk and give it a unique name related to the order (for instance my order number which is a primary key in the database, or a GUID which I could store in an additional column of the order table). Perhaps I have to store the files in subfolders, one per month, and store the subfolder name into the order row in the database, to avoid getting too many thousand files in one folder.
After the PDF files are stored they can be downloaded and viewed via browser after entering the related order number.
I'm tending towards option (1) because the data management seems easier to me having all relevant data in one database. But I am a bit afraid that I could encounter performance issues over time since my database size will grow much faster than with solution (2). Around 90% or even 95% of the total database size would be made up only by those stored PDF files.
Here is some additional information:
(I am aware that I will reach the 4GB-limit of the SQL Server Express edition after around 2 years with the numbers above. But we can disregard this here, either removing old data from the database or upgrading to a full license will be a possible option.)
My question is: What are the Pro and Contras of the options and what would you recommend? Perhaps someone had a similar task and can report about his experience.
Thank you in advance for reply!
Related:
Storing Images in DB - Yea or Nay?
You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem.
Storing and Displaying Files in a Database For example, in MySQL, the datatype that accepts PDF bytes is a LongBlob datatype, so you will need to set the PDF Data column to the LongBlob datatype. MS SQL accepts the Varbinary datatype, so you'll need to set the PDF Data column to a Varbinary datatype.
PDF files are either 8-bit binary files or 7-bit ASCII text files (using ASCII-85 encoding). Every line in a PDF can contain up to 255 characters.
You can store the PDF inside of a table using a varbinary field and an extension field. Then you can take advantage of the Fulltext serch engine to search inside of the PDFs. You will have to install a PDF iFilter in your SQL server.
With SQL Server 2008, when you have documents that are mostly 1 MB or more in size, the FILESTREAM feature would be recommended. This is based on a paper published by Microsoft Research called To BLOB or not to BLOB which analyzed the pros and cons of storing blobs in a database in great length - great read!
For documents of less than 256K on average, storing them in a VARBINARY(MAX)
column seems to be the best fit.
Anything in between is a bit of a toss-up, really.
You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem. One thing you might want to consider is having a separate table for the documents that is linked to the main facts table. That way, the facts table will be faster in usage, and the documents don't get in the way of your other data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With