Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-insensitive primary key of type nvarchar where ß != ss

This question "Need a case insensitive collation where ss != ß" solves it for varchar type column, but mine has to be nvarchar.

As far as I can gather, SQL_Latin1_General_Cp437_BIN differentiates between ß and ss. But it is also case-sensitive. Mine is a primary key column which also needs to be case INsensitive:

I need e.g. weiß/Weiß to be considered equal, and also weiss/Weiss, but NOT weiß/weiss nor Weiß/Weiss nor weiß/Weiss etc.

I've searched a lot for this, and is it really so that I'm out of luck here? There has to be a lot of people in the same situation, I just can't believe this could be unsolvable.

like image 937
David S. Avatar asked Oct 19 '22 21:10

David S.


1 Answers

Ah, but it is possible as there is a collation that handles this naturally (several actually, but all in the same "family"): Hungarian_Technical_* (well, minus the CaseSensitive (CS) and Binary (BIN / BIN2) variations). Of course, I'm still not sure this is a good choice for a PK, but no reason it can't be a UNIQUE INDEX.

SELECT tmp.*,
       IIF(tmp.[First] = tmp.[Second] COLLATE Hungarian_Technical_CI_AI,
           N'Equal',
           N'NOT EQUAL') AS [Result]
FROM (
      SELECT N'weiß', N'Weiß', N'Equal'
      UNION ALL
      SELECT N'weiss', N'Weiss', N'Equal'
      UNION ALL
      SELECT N'weiß', N'weiss', N'NOT Equal'
      UNION ALL
      SELECT N'Weiß', N'Weiss', N'NOT Equal'
      UNION ALL
      SELECT N'weiß', N'Weiss', N'NOT Equal'
      ) tmp ([First], [Second], [Ideal]);

Results:

First   Second  Ideal       Result
weiß    Weiß    Equal       Equal
weiss   Weiss   Equal       Equal
weiß    weiss   NOT Equal   NOT EQUAL
Weiß    Weiss   NOT Equal   NOT EQUAL
weiß    Weiss   NOT Equal   NOT EQUAL

There are 24 total collations that will work in this manner. You can find them via:

SELECT hc.*
FROM sys.fn_HelpCollations() hc
WHERE hc.name LIKE N'Hungarian[_]Technical%'
AND hc.name LIKE N'%[_]CI%'
ORDER BY hc.name;

They are:

Hungarian_Technical_100_CI_AI Hungarian_Technical_100_CI_AI_KS
Hungarian_Technical_100_CI_AI_KS_SC
Hungarian_Technical_100_CI_AI_KS_WS
Hungarian_Technical_100_CI_AI_KS_WS_SC
Hungarian_Technical_100_CI_AI_SC Hungarian_Technical_100_CI_AI_WS
Hungarian_Technical_100_CI_AI_WS_SC Hungarian_Technical_100_CI_AS
Hungarian_Technical_100_CI_AS_KS Hungarian_Technical_100_CI_AS_KS_SC
Hungarian_Technical_100_CI_AS_KS_WS
Hungarian_Technical_100_CI_AS_KS_WS_SC
Hungarian_Technical_100_CI_AS_SC Hungarian_Technical_100_CI_AS_WS
Hungarian_Technical_100_CI_AS_WS_SC Hungarian_Technical_CI_AI
Hungarian_Technical_CI_AI_KS Hungarian_Technical_CI_AI_KS_WS
Hungarian_Technical_CI_AI_WS Hungarian_Technical_CI_AS
Hungarian_Technical_CI_AS_KS Hungarian_Technical_CI_AS_KS_WS
Hungarian_Technical_CI_AS_WS

If the version 100 collations are available (meaning: you are using SQL Server 2008 or newer), then use those and not the collations with no version number in their names.

EDIT:
I found some additional info on the "Hungarian Technical" collation that might be of interest:

  • MySQL: Hungarian collation -- This is an archive of a listserv discussion by some MySQL developers trying to figure out the specifics of how to implement the collation. Apparently it has some complex rules, due at least in part to needing to equate multiple characters into a single character for sorting (look for section "3. Special digraph/trigraph rule"). This might cause some unexpected behavior. The specific letter combinations are noted in that section so at least it makes it easier to set up a test case to help determine if the sorting rules invalidate using this collation.

  • Why are there so many(106) Hungarian collations? -- This is a discussion on social.msdn that has some interesting info. There is a comment in there by Erland Sommarsko in which he states:

    I ran a crazy performance test to test all collations in SQL 2008. I had an idle server available, and the test ran for 8-9 days. And Hungarian_Technical came out as the slowest as I recall.

    Given the discussion from the MySQL devs about the complex rules, those test results don't seem all that surprising.

So, both of these items probably should be factored into deciding whether or not to go the easy route of simply setting the NVARCHAR field to one of these Hungarian Technical collations, or going with @GarethD's recommendation of the computed column.

For more info on working with strings and collations, please visit: Collations Info

like image 89
Solomon Rutzky Avatar answered Oct 23 '22 01:10

Solomon Rutzky