Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server 2008 FILESTREAM performance

I had some questions around the FILESTREAM capability of SQL Server 2008.

  1. What would the difference in performance be of returning a file streamed from SQL Server 2008 using the FILESTREAM capability vs. directly accessing the file from a shared directory?

  2. If 100 users requested 100 100Mb files (stored via FILESTREAM) within a 10 second window, would SQL Server 2008 performance slow to a crawl?

like image 782
John Livermore Avatar asked Dec 29 '09 21:12

John Livermore


People also ask

What are advantages of Filestream?

FILESTREAM, in SQL Server, allows storing these large documents, images or files onto the file system itself. In FILESTREAM, we do not have a limit of storage up to 2 GB, unlike the BLOB data type. We can store large size documents as per the underlying file system limitation.

What is Filestream access level in SQL Server?

Filestream integrates the Database Engine with your NTFS file system by storing BLOB data as files on the file system and allowing you to access this data either using T-SQL or Win32 file system interfaces to provide streaming access to the data.

What is the difference between Filestream and FileTable?

FileStream and FileTable are features of SQL Server for storing unstructured data in SQL Server alongside other data. The FileStream feature stores unstructured data in the file system and keeps a pointer of the data in the database, whereas FileTable extends this feature even further allowing non-transactional access.

Is Filestream enabled SQL Server?

Enabling FILESTREAM Right-click the instance, and then click Properties. In the SQL Server Properties dialog box, click the FILESTREAM tab. Select the Enable FILESTREAM for Transact-SQL access check box. If you want to read and write FILESTREAM data from Windows, click Enable FILESTREAM for file I/O streaming access.


2 Answers

If 100 users requested 100 100Mb files (stored via FILESTREAM) within a 10 second window, would SQL Server 2008 performance slow to a crawl?

On what kind of a server?? What kind of hardware to serve those files? What kind of disks, network etc.?? So many questions.......

There's a really good blog post by Paul Randal on SQL Server 2008: FILESTREAM Performance - check it out. There's also a 25-page whitepaper on FILESTREAM available - also covering some performance tuning tips.


But also check out the Microsoft Research TechReport To BLOB or Not To BLOB.

It's a very profound and very well based article that put all those questions through their paces.

Their conclusion:

The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.

So judging from that - if your blobs are typically less than 1 MB, just store them as a VARBINARY(MAX) in the database. If they're typically larger, then just the FILESTREAM feature.

I wouldn't worry so much about performance rather than other benefits of FILESTREAM over "unmanaged" storage in a NTFS file folder: storing files outside the database without FILESTREAM, you have no control over them:

  • no access control provided by the database
  • the files aren't part of your SQL Server backup
  • the files aren't handled transactionally, e.g. you could end up with "zombie" files which aren't referenced from the database anymore, or "skeleton" entries in the database without the corresponding file on disk

Those features alone make it absolutely worthwhile to use FILESTREAM.

like image 64
marc_s Avatar answered Nov 09 '22 12:11

marc_s


Reading a FILESTREAM over Win32 is quite fast. See Managing FILESTREAM Data by Using Win32. You should follow the FILESTREAM best practices though. After all, this is what powers Sharepoint and MS would not bet something as important as Office (==Sharepoint) on unperformance storage. There are some case studies and white papers around FILESTREAM, I could only digg out Laren Electronics Fuels Analysis of Formula One Racing Data with SQL Server but I know there are more with more detailed numeric data. If I recall correctly it shows that that FILESTREAM in general shadows SMB performance by about 90-95% factor, over a certain file size. For small files the overhead of obtaining the FILESTREAM API handle starts to show up.

I'd also second Marc in recommending reading over the Research paper on the topic (there is also a Channel 9 interview with Catharine van Ingen, available on iTunes podcasts too, where she speaks about this work), but bear in mind that the paper is published in 2006 before FILESTREAM was officially released, so it does not consider the FILESTREAM specifics.

As for your second question, asking about performance by only specifying the load and not the capacity of the system is a non-sense. A 128 CPU Superdome with a mountain of storage SANs won't even notice your load. A SQL runing on a 256 MB laptop with a mountain of spyware won't even get to see your load...

like image 36
Remus Rusanu Avatar answered Nov 09 '22 13:11

Remus Rusanu