I have used CLR stores procedures in SQL server for awhile now, but I'm still wondering what the best situations to use them are.
MSDN provides some guidelines for use like heavy string manipulation (regex), or replacing T-SQL that declares lots of table variables and cursors. I'm curious to know what problems SO users are solving with CLR stored procs, and examples / benchmarks as well.
For example, I have found CLR stored procs + SSRS to be a great way to get data manipulation logic out of SSRS and out of T-SQL, and into managed code that is easier to read and manipulate.
Many problems requiring denormalization and/or sequential operations can be handled exceptionally well by the CLR and can be used to dramatically improve performance without sacrificing usability on the SQL end (much). Instead of relying entirely on either set-based or iterative operations, you can take a hybrid approach, use a set-based solution for the big hauls and switch to an iterative model for the tight loops.
The built-in hierarchyid
and geospatial (i.e. geography
) types in SQL Server 2008 are good examples of the denormalization problem. Both contain an (almost) arbitrarily large amount of data that are difficult to normalize without hurting performance - you would need to use recursion or cursors to do any meaningful work with them otherwise, or use a rat's nest of triggers and/or scheduled tasks to maintain a denormalization table.
Another problem I've solved with CLR types is inline compression. This might sound like a pointless or academic exercise, but when your fully-normalized data is pushing into the terabytes, an 80-90% reduction in size means a lot. SQL has its own built-in compression now and SQL 2005 had vardecimal, and those are good tools as well, but a domain-aware "minimization" algorithm can be several times more efficient in terms of both CPU load and compression rate. Obviously this doesn't apply to every problem, but it applies to some.
Yet another very common problem often found on this site is generating a sequence on the fly - for example a sequence of consecutive dates. Common solutions are recursive CTEs, static sequence tables, and the little-known spt_values
tables, but a simple CLR UDF performs better than any of them and offers a lot more flexibility.
Last on my list: User-defined streaming aggregates are also very useful, especially for anything statistics-related. There are some things you simply cannot compose out of the built-in SQL aggregates, such as medians, weighted moving averages, etc. UDAs can also take multiple arguments so you can parameterize them; technically an aggregate isn't guaranteed to receive data in any particular order in the current version of SQL Server, but you can get around that limitation by feeding it a ROW_NUMBER
as an additional argument and use this to implement just about any windowing function (have the aggregate spit out a UDT which can then be transformed to a table).
It's actually very frustrating how few examples there are of truly useful SQL-CLR applications; search on Google and you'll get 10 million results, every single one of them for some silly string-concatenation or regex. These are useful, but take a few minutes to learn about SQL UDTs and UDAs in particular and you'll start seeing plenty of uses for them in your own applications. Don't go nuts, of course - think carefully about whether or not there's a better solution in pure SQL - but don't discount them either.
String manipulation - regular expression search is a classic. Very easy to expose in CLR, very difficult to do in straight T-SQL.
See this link for details of implementation and a micro-benchmark (SQLCLR is only 47 milliseconds compared to 6.187 seconds for the T-SQL UDF
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With