In SQL Server 2012 I have a table with an nvarchar column with collation Latin1_General_100_CI_AS_SC, which is supposed to support unicode surrogate pair characters, or supplementary characters.
When I run this query:
select KeyValue from terms where KeyValue = N'➰'
(above is a Unicode SC)
above is a curly loop character with code 10160 (x27B0)
The result is hundreds of different looking single character entries, even though they all have different UTF-16 codepoints. Is this due to collation? Why isn't there an exact match?
EDIT: I now think this is due to collation. There seems to be a group of "undefined" characters in the UTF-16 range, more than 1733 characters, and they are treated as the same by this collation. Although, characters with codes above 65535 are treated as unique and those queries return exact matches.
The two queries below have different results:
select KeyValue from terms where KeyValue = N'π'
returns 3 rows: π and ℼ and ᴨ
select KeyValue from terms where KeyValue LIKE N'π'
returns 2 rows: π and ℼ
Why is this?
This is the weirdest of all. This query:
select KeyValue from terms where KeyValue like N'➰%'
returns ALMOST ALL records in the table, which has many multiple character regular latin character set terms like "8w" or "apple". 90% of those not being returned are starting with "æ". What is happening?
NOTE: Just to give this a bit of context, these are all Wikipedia article titles, not random strings.
SQL Server UNICODE() Function The UNICODE() function returns an integer value (the Unicode value), for the first character of the input expression.
January 27, 2020 by Jignesh Raiyani. SQL Server collation refers to a set of character and character encoding rules, and influences how information is stored according to the order in the data page, how data is matched by comparing two columns, and how information is arranged in the T-SQL query statement.
SQL Server and thus tempdb also have their own collation, and they may not be the same as a database's or a column's collation. While character literals should be assigned the default collation of the column or database, the above (perhaps overly simplified) T-SQL examples could be misstating/not revealing the true problem. For example, an ORDER BY clause could have been omitted for the sake of simplicity. Are expected results returned when above statements explicitly use https://msdn.microsoft.com/en-us/library/ms184391.aspx ('COLLATE Latin1_General_100_CI_AS_SC')?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With