Why does SQL Server consider N'㐢㐢㐢㐢' and N'㐢㐢㐢' to be equal?

Tags:

unicode

We are testing our application for Unicode compatibility and have been selecting random characters outside the Latin character set for testing.

On both Latin and Japanese-collated systems the following equality is true (U+3422):

N'㐢㐢㐢㐢' = N'㐢㐢㐢'

but the following is not (U+30C1):

N'チチチチ' = N'チチチ'

This was discovered when a test case using the first example (using U+3422) violated a unique index. Do we need to be more selective about the characters we use for testing? Obviously we don't know the semantic meaning of the above comparisons. Would this behavior be obvious to a native speaker?

288

asked May 12 '10 12:05

Aidan Ryan

1 Answers

Michael Kaplan has a blog post where he explains how Unicode strings are compared. It all comes down to the point that a string needs to have a weight, if it doesn't it will be considered equal to the empty string.

Sorting it all Out: The jury will give this string no weight

In SQL Server this weight is influenced by the defined collation. Microsoft has added appropriate collations for CJK Unified Ideographs in Windows XP/2003 and SQL Server 2005. This post recommends to use Chinese_Simplified_Pinyin_100_CI_AS or Chinese_Simplified_Stroke_Order_100_CI_AS:

You can always use any binary and binary2 collations although it wouldn't give you Linguistic correct result. For SQL Server 2005, you SHOULD use Chinese_PRC_90_CI_AS or Chinese_PRC_Stoke_90_CI_AS which support surrogate pair comparison (but not linguistic). For SQL Server 2008, you should use Chinese_Simplified_Pinyin_100_CI_AS and Chinese_Simplified_Stroke_Order_100_CI_AS which have better linguistic surrogate comparison. I do suggest you use these collation as your server/database/table collation instead of passing the collation name during comparison.

So the following SQL statement would work as expected:

select * from MyTable where N'' = N'㐀' COLLATE Chinese_Simplified_Stroke_Order_100_CI_AS;

A list of all supported collations can be found in MSDN:

SQL Server 2008 Books Online: Windows Collation Name

199

answered Nov 16 '22 03:11

Dirk Vollmar

Related questions
                            
                                How to free up memory used by idle SQL Server databases?
                            
                                SQL server query processor ran out of internal resources
                            
                                Create @TableVariable based on an existing database table?
                            
                                Determine MAX Decimal Scale Used on a Column
                            
                                INSTEAD OF TRIGGER, Would it infinitely loop?
                            
                                How does using TRUNCATE TABLE affect Indexes
                            
                                Can we write case statement without having else statement
                            
                                How do I insert data when the primary key column is not an identity column?
                            
                                Fill SQL database from a CSV File
                            
                                Full Text Search in EF Core 2.1?
                            
                                Can the SQL Case Statement fall through?
                            
                                SQL - How to ALTER COLUMN on a computed column
                            
                                How to find Expr#### in Execution Plan
                            
                                SQL Server recursive query
                            
                                How long should a query that returns 5 million records take?
                            
                                How can I use LEFT & RIGHT Functions in SQL to get last 3 characters?
                            
                                How to connect to SQL server database from a Windows 10 UWP app
                            
                                Create a non-clustered index in Entity Framework Core
                            
                                How do you continue to improve your SQL skills? [closed]
                            
                                Why use INCLUDE in a SQL index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With