Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server 2008's Filestream location

I have approximately 7 terabytes of various media files (pdf's, jpg's, tiff's) that currently reside on a very beefed up file server. I am looking at moving the data to SQL Server 2008 and using the Filestream attribute to help me manage the data. I want to do this because I have webpages that manage this media, and they (the webpages) are getting slower and slower as more media is added daily to the file server.

EDIT: The webpages are slow because many of them produce reports that reflect various details of the file server and what is stored on it. Essentially, the webpages comb through thousands of folders and files to generate reports about what is contained in them. Some webpages allow users to manipulate folders and files and move them to different locations. So, in a nutshell, I'm looking for a faster manner in managing these files. It would also allow me to maintain metadata about these files within the database, thus allowing me to query the database for this info instead of combing through the file server for it.

My Problems:

1) I have done a proof of concept and verified that I can create a filestream local to the SQL Server 2008 database, and I've successfully read and wrote media to it. However, I have yet to figure out how to use an UNC as a filestream. In other words, the database is hosted on MySQLDB08, and my files are stored on TheFileServer01. I've read it's possible, but I haven't gotten there yet. Any help on this would be greatly appreciated!

2) Since I have 7 terabytes (and growing) of media, will my backups be unmanageable due to their size? Is this something that could dissuade me from using Filestream?

Any suggestions or help would be greatly appreciated!

like image 776
Jagd Avatar asked Feb 09 '10 00:02

Jagd


1 Answers

  1. You can't. Afaik filestream data is stored localy and SQL will refuse to read/write from/to an UNC.
  2. Your full backups will contain the entire filestream data. Unmanageable? Definetly a very serious challenge.

My question would be what is the benefit you want to extract from the filestream? The usual benefits come from BLOB integration with database operations while keeping availability for Win32 file handle based operations:

Even though FILESTREAM technology has many attractive features, it may not be the optimal choice in all situations. As mentioned earlier, the size of the BLOB data and the access patterns are the most significant factors when deciding whether to store the BLOB data wholly within the database or by using FILESTREAM.

Size affects the following:

  • Efficiency with which the BLOB data can be accessed using either storage mechanism. As mentioned earlier, streaming access of large BLOB data is more efficient using FILESTREAM, but partial updates are (potentially much) slower.
  • Efficiency of backing up the combined structured and BLOB data using either storage mechanism. A backup that combines SQL Server database files and a large number of FILESTREAM files will be slower than a backup of just SQL Server database files of an equivalent total size. This is because of the extra overhead of backing up each NTFS file (one per FILESTREAM data value). This overhead becomes more noticeable when the FILESTREAM files are smaller (as the time overhead becomes a larger percentage of the total time to backup per MB of data).

From a pure performance point of view, there are many steps you can do on a file system level to improve performance. What is you current problem, why is your system throughput affected by the media size? It means you have a somewhere a choke point of contention, perhaps a directory enumeration, or some other barrier that causes you to scale the response time with the media size. Your access to the media should be O(1), maybe O(logn),b ut definetely not O(n).

I'd recommend you go over the SQL White Paper FILESTREAM Storage in SQL Server 2008, from where I found my quote about use cases.

like image 55
Remus Rusanu Avatar answered Sep 23 '22 10:09

Remus Rusanu