Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server is stripping out unicode in equals query

Tags:

sql

sql-server

Can anyone explain why when I run:

select * from (select N'someString💩' as id)_id where _id.id = N'someString'

in sql server I get a result of someString💩

I am not doing a like comparison, and cannot find anything in the documentation to explain this behaviour, I need an exact match or I need validation rules in my server to exclude any character that behaves like this.

DB is using collation SQL_Latin1_General_CP1_CI_AS if this has any impact?

like image 729
GavinF Avatar asked Apr 01 '19 13:04

GavinF


People also ask

Does SQL Server support Unicode?

Both Unicode and non-Unicode sorting are compatible with string comparisons in a particular version of Windows. This provides consistency across data types within SQL Server, and it lets developers sort strings in their applications by using the same rules that are used by SQL Server.

How do I select Unicode characters in SQL Server?

SQL Server UNICODE() Function The UNICODE() function returns an integer value (the Unicode value), for the first character of the input expression.

How do you handle special characters in SQL?

Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence. When you use braces to escape a single character, the escaped character becomes a separate token in the query. Use the backslash character to escape a single character or symbol.


2 Answers

Appears that, regardless of it's position, the poo emoji is skipped. For example the below returns every row:

SELECT *
FROM (VALUES(N'someString💩'),
            (N'💩someString'),
            (N'some💩String'),
            (N'some💩String💩')) V(S)
WHERE S = N'someString';

If you use a binary collation, this doesn't happen:

SELECT *
FROM (VALUES(N'someString💩'COLLATE SQL_Latin1_General_CP850_BIN),
            (N'💩someString'COLLATE SQL_Latin1_General_CP850_BIN),
            (N'some💩String'COLLATE SQL_Latin1_General_CP850_BIN),
            (N'some💩String💩'COLLATE SQL_Latin1_General_CP850_BIN)) V(S)
WHERE S = N'someString';

If SQL Server dealing with these unicode characters/emoji is important, then a binary collation will likely be your best choice.

like image 55
Larnu Avatar answered Oct 06 '22 02:10

Larnu


You were probably in the right track. The collation will define how the comparison is made. Here's one option that could help you solve your problem.

SELECT *
FROM (VALUES(N'someString💩'),
            (N'💩someString'),
            (N'some💩String'),
            (N'some💩String💩'),
            (N'somestring')) V(S)
WHERE S = N'someString' COLLATE Latin1_General_100_CI_AI_KS_WS

Collation is described as Latin1-General-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive

like image 23
Luis Cazares Avatar answered Oct 06 '22 03:10

Luis Cazares