Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing PDF files as binary objects in SQL Server, yes or no?

Tags:

I have to find a design decision for the following task:

I have a SQL Server database and it contains a table of orders. PDF documents will be uploaded by users through a simple file upload from a web page and assigned to an order. There is not more than one document per order (perhaps no document, never more than one). For this purpose a user opens a web page, enters an order number, gets the order displayed and clicks on an upload button. So I know to which order the uploaded document belongs to.

Now I am considering two options to store the documents on the web server:

1) Extend my table of orders by a varbinary(MAX) column and store the PDF document directly into that binary field.

2) Save the PDF file in a specific folder on disk and give it a unique name related to the order (for instance my order number which is a primary key in the database, or a GUID which I could store in an additional column of the order table). Perhaps I have to store the files in subfolders, one per month, and store the subfolder name into the order row in the database, to avoid getting too many thousand files in one folder.

After the PDF files are stored they can be downloaded and viewed via browser after entering the related order number.

I'm tending towards option (1) because the data management seems easier to me having all relevant data in one database. But I am a bit afraid that I could encounter performance issues over time since my database size will grow much faster than with solution (2). Around 90% or even 95% of the total database size would be made up only by those stored PDF files.

Here is some additional information:

  • The PDF files will have a size of around 100 Kilobyte each
  • Around 1500 orders/PDF files per month
  • Windows Server 2008 R2 / IIS 7.5
  • SQL Server 2008 SP1 Express
  • Not quite sure about the hardware, I believe one QuadCore Proc. and 4 GB RAM
  • Application is written in ASP.NET Webforms 3.5 SP1

(I am aware that I will reach the 4GB-limit of the SQL Server Express edition after around 2 years with the numbers above. But we can disregard this here, either removing old data from the database or upgrading to a full license will be a possible option.)

My question is: What are the Pro and Contras of the options and what would you recommend? Perhaps someone had a similar task and can report about his experience.

Thank you in advance for reply!

Related:

Storing Images in DB - Yea or Nay?

like image 283
Slauma Avatar asked Feb 27 '10 15:02

Slauma


People also ask

Can we store PDF files in SQL Server database?

You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem.

Which datatype is used to store PDF file in SQL Server?

Storing and Displaying Files in a Database For example, in MySQL, the datatype that accepts PDF bytes is a LongBlob datatype, so you will need to set the PDF Data column to the LongBlob datatype. MS SQL accepts the Varbinary datatype, so you'll need to set the PDF Data column to a Varbinary datatype.

Is a PDF file binary?

PDF files are either 8-bit binary files or 7-bit ASCII text files (using ASCII-85 encoding). Every line in a PDF can contain up to 255 characters.

How are PDF files stored in database?

You can store the PDF inside of a table using a varbinary field and an extension field. Then you can take advantage of the Fulltext serch engine to search inside of the PDFs. You will have to install a PDF iFilter in your SQL server.


1 Answers

With SQL Server 2008, when you have documents that are mostly 1 MB or more in size, the FILESTREAM feature would be recommended. This is based on a paper published by Microsoft Research called To BLOB or not to BLOB which analyzed the pros and cons of storing blobs in a database in great length - great read!

For documents of less than 256K on average, storing them in a VARBINARY(MAX) column seems to be the best fit.

Anything in between is a bit of a toss-up, really.

You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem. One thing you might want to consider is having a separate table for the documents that is linked to the main facts table. That way, the facts table will be faster in usage, and the documents don't get in the way of your other data.

like image 154
marc_s Avatar answered Oct 14 '22 08:10

marc_s