Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the Collation of string Fields in a Result Set via SqlDataReader?

When retrieving a Result Set from a SqlDataReader, it is possible to get most of the meta-data for that Result Set by using the GetSchemaTable method. One thing that is missing, however, is the Collation of the string fields (i.e. CHAR, VARCHAR, NCHAR, NVARCHAR, SQL_VARIANT – if containing a string type – and even the deprecated TEXT and NTEXT). Is there any way at all to get this info?

While it is technically possible to call the GetSqlString method which has the LCID and SqlCompareOptions properties, those properties return the values related to the default Collation of the Database, which may or may not be the Collation of any particular field (even though most people seem to just assume that it is the same) or expression (due to Collation Precedence). Those properties only provide accurate info when the source data type is SQL_VARIANT, which is not terribly helpful.

The problem with not having this extra info (at the very least the "LCID" which equates to the Locale) is that while all characters, regardless of source encoding, can be represented in .NET without loss (since .NET strings are UTF-16 Little Endian), it is impossible to determine what Locale to use when comparing fields in the result set to other strings.

The Collation info, per each string field, is definitely part of the result set meta-data sent to the client from SQL Server via the TDS stream. It is used:

  • to expose the LCID via the GetLocaleId method of the SqlDataReader class. But that method is marked internal so I don't have access to it. It seems to only be used in one place: SqlBulkCopy.
  • to provide the Result Set structure to the Send(SqlDataReader) method of SqlPipe, which is used in SQLCLR Stored Procedures and Triggers.
  • to set the proper encoding, depending on circumstances, in the GetTextReader() method of SqlDataReader. However, while that method is public, the TextReader class does not have a property for Encoding; the encoding info was only used internally.

UPDATE

To clarify: the desire is to have a means of getting this information that does not require the Assembly to be Fully Trusted / UNSAFE. The intended use is code that runs in both a Console App as well as SQLCLR objects (Stored Procedures, Functions, etc). If the Assembly (when loaded into SQL Server) is required to have a PERMISSION_SET of EXTERNAL_ACCESS, that would be acceptable. But requiring the SQLCLR Assembly to be marked as UNSAFE will not work.

The ultimate supreme ideal would be getting the full Collation name as it exists in SQL Server (e.g. Latin1_General_100_BIN2), and with the Assembly, within SQL Server, being marked as SAFE.

UPDATE 2

Using Reflection, as per @Jonathan's answer, it is possible to invoke the "internal" GetLocaleId method, and it does return the correct LCID. However, when using this code in a SQLCLR object, it will get the following exception if the Assembly is not marked as UNSAFE:

Msg 6522, Level 16, State 1, Line 9
A .NET Framework error occurred during execution of user-defined routine or aggregate "GetFieldCollation":

System.MethodAccessException: Attempt by method 'UserDefinedFunctions.GetFieldCollation(System.Data.SqlTypes.SqlString, System.Data.SqlTypes.SqlBoolean)' to access method 'System.Data.SqlClient.SqlDataReader.GetLocaleId(Int32)' failed.

 System.MethodAccessException:
  at System.RuntimeMethodHandle.PerformSecurityCheck(Object obj, RuntimeMethodHandleInternal method, RuntimeType parent, UInt32 invocationFlags)
  at System.RuntimeMethodHandle.PerformSecurityCheck(Object obj, IRuntimeMethodInfo method, RuntimeType parent, UInt32 invocationFlags)
  at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
  at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)

Looking at this MSDN article, Walkthrough: Emitting Code in Partial Trust Scenarios, it even mentions (emphasis added):

Adding RestrictedMemberAccess to Sandboxed Domains
...
For example, a host might grant Internet applications Internet permissions plus RMA, so that an Internet application can emit code that accesses private data in its own assemblies. Because the access is limited to assemblies of equal or lesser trust, an Internet application cannot access members of fully trusted assemblies such as .NET Framework assemblies.

Unfortunately, requiring the Assembly to be UNSAFE goes against the requirements.

And, to be honest, this is really just part of the overall puzzle. As stated in the first UPDATE section, the goal is to have this and the true Collation Name, and that does not seem to exist anywhere at the moment. Hence, I sent the following suggestion with Microsoft:

Expose Collation Info of SQL Server Result Set via SqlDataReader (link no longer valid due to visualstudio.uservoice.com being shut down)

UPDATE 3

To clarify: ideally there would be a method or property that returns the exact collation used for a given string field (e.g. Latin1_General_100_CI_AS_KS_WS_SC). However, merely exposing LCID and SqlCompareOptions is likely not good enough as those properties do not convey the following info:

  • collation version (currently 80 (i.e. unspecified), 90, 100, or 140 / or even the alternate 0, 1, 2, or 3 would be good enough)
  • _BIN vs _BIN2
  • CodePage
  • _VSS (Variation Selector Sensitive; starting in SQL Server 2017 via the Japanese version 140 collations)
  • _SC (Supplementary Character Aware -- i.e. allowing built-in functions to deal with NVARCHAR data as UTF-16 instead of UCS-2)
  • SQL Server collation vs Windows collation
  • SortID (only used for SQL Server collations)
  • _UTF8 (UTF-8 encoding for VARCHAR data; new in SQL Server 2019)

At bare minimum the additional info sent in the Tabular Data Stream (TDS) needs to be exposed. That info is:

  • collation version (currently 80 (i.e. unspecified), 90, 100, or 140 / or even the alternate 0, 1, 2, or 3 would be good enough)
  • _BIN vs _BIN2
  • SortID (only used for SQL Server collations)
  • _UTF8 (UTF-8 encoding for VARCHAR data; new in SQL Server 2019)

The SortID, because it's only used for SQL Server collations, gets us:

  • SQL Server collation vs Windows collation

The SortID or LCID (depending on type of collation), can be used to derive:

  • CodePage

That still leaves the following options currently unaccounted for:

  • _VSS (Variation Selector Sensitive; starting in SQL Server 2017 via the Japanese version 140 collations)
  • _SC (Supplementary Character Aware -- i.e. allowing built-in functions to deal with NVARCHAR data as UTF-16 instead of UCS-2)

So, perhaps this is not technically possible at the moment, but I figured I would ask just in case there was something I was missing. It just seems quite odd to me that this info isn't already available.

like image 889
Solomon Rutzky Avatar asked Nov 08 '22 20:11

Solomon Rutzky


1 Answers

You can access internal property easily with reflection or expression tree (for performance)

var method = typeof (SqlDataReader).GetMethod("GetLocaleId", BindingFlags.NonPublic | BindingFlags.Instance);
using (var conn = new SqlConnection(My.Config.ConnectionStrings.SqlConnection))
{
    using (var command = new SqlCommand("SELECT TOP 0 * FROM TestTable", conn))
    {
        conn.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            var schema = reader.GetSchemaTable();

            // 1033 = Latin1_General_CI_AS (confirmed)
            var collation2 = method.Invoke(reader, new object[] { 2 });

            // 1048 = Romanian_CI_AS (Current : SQL_Romanian_CP1250_CI_AS, close enough!)
            var collation3 = method.Invoke(reader, new object[] { 3 });
        }
    }
}

Collation: https://msdn.microsoft.com/en-us/library/ms143508(v=sql.105).aspx

Edit

You can access to all internal, private or anything else via reflection or expression tree.

You can also use Eval Expression.NET to make it even more easier!

// using Z.Expressions;

var getLocalIdCompiled = Eval.Compile<Func<SqlDataReader, int, int>>("reader.GetLocaleId(value)", "reader", "value");

using (var conn = new SqlConnection(My.Config.ConnectionStrings.SqlConnection))
{
    using (var command = new SqlCommand("SELECT TOP 0 * FROM TestTable", conn))
    {
        conn.Open();
        using (SqlDataReader reader = command.ExecuteReader())
        {
            var schema = reader.GetSchemaTable();

            var collation2 = getLocalIdCompiled(reader, 2);
            var collation3 = getLocalIdCompiled(reader, 3);
        }
    }
}

Disclaimer: I'm the owner of the project Eval Expression.NET

like image 85
Jonathan Magnan Avatar answered Nov 14 '22 21:11

Jonathan Magnan