Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CLR sql server performance

We are using CLR-functions in our ETL-processes to have specific data-conversion and data-checking logic centralized. These functions are rather basic and require no data-access and are deterministic therefor allowing parallellism.

For instance:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]
public static bool check_smallint(string input)
{
    string teststring;
    try
    {
        teststring = input.Trim(' ').Replace('-', '0');
        if (teststring.Length == 0)
        {
            teststring = "0";
        }
        Convert.ToInt16(teststring);
    }
    catch (NullReferenceException)
    {
        return true;
    }
    catch (FormatException)
    {
        return false;
    }
    catch (OverflowException)
    {
        return false;
    }
    return true;
}

This works fine except for performance. Query's have slowed down considerably, wihich is causing trouble in processing large datasets (millions of rows and more).

Until now we have found no one who really understands the SQL CLR-architecture, but one suggestion we received is that it might be caused by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

Please don't suggest different solutions, we are already considering them, like inline sql, or a complete different approach. Standard sql-functions are in many cases no option because of the lack of error raising.

PS. We are using SQL 2008R2.

like image 269
Rudolf van der Heide Avatar asked Apr 19 '26 11:04

Rudolf van der Heide


2 Answers

by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

It's not something you have to worry about on C# side. You're not allocating memory (of course you're allocating strings and stuff you need inside your function, nothing you can pool/reuse). Also connection isn't something you have to worry about.

This works fine except for performance.

Your code is doing something incredibly...EXCEPTIONALLY...slow: throwing exceptions instead of performing checks. An exception is an expansive operation and should be used to handle exceptional situations (just 100/200 records with a null - or invalid - value and it'll slow down a query over 1,000,000 records). Wrong input format or null values in a database column...aren't exceptional (this programming style - exceptions instead of checks - is allowed and even encouraged in other languages like Python. I'd in general avoid it in C#. For sure it's not appropriate here where performance is an issue).

public static bool check_smallint(string input)
{
    if (String.IsNullOrWhiteSpace(input))
        return true;

    short value;
    return Int16.TryParse(input, out value);
}

Note that: String.IsNullOrWhiteSpace(input) will return true for null inputs or strings made only of spaces (replacing your Trim() and NullReferenceException stuff). Everything else (FormatException for input text that is not an integer or a too big number with OverflowException) is handled by Int16.TryParse(). Code is shorter (and slightly faster) for valid inputs but it's many times faster for invalid ones.

like image 102
Adriano Repetti Avatar answered Apr 22 '26 02:04

Adriano Repetti


I am making this a separate answer instead of a comment on @Adriano's answer so that it is less likely to be missed (since not everyone reads all of the comments).


In addition to changing the approach as suggested by @Adriano, you should really be using the appropriate datatypes, found in the System.Data.SqlTypes Namespace, for all input/output parameters and return values. There are some important differences and benefits to using them, such as them all having an .IsNull property. The full list of differences is too much info to put here, but I did document it in the following article: Stairway to SQLCLR Level 5: Development (Using .NET within SQL Server)

Adapting @Adriano's code to use the proper types would give you the following:

public static SqlBoolean check_smallint(SqlString input)
{
    if (input.IsNull)
        return true;

    if (input.Value.Trim() == String.Empty)
        return true;

    short value;
    return Int16.TryParse(input.Value, out value);
}
like image 28
Solomon Rutzky Avatar answered Apr 22 '26 02:04

Solomon Rutzky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!