SQL sort order in Japanese breaks when text includes non-Japanese characters

Tags:

sql-server

It seems that Japanese sorting "breaks" when the text contains non-japanese text, even when forcing any possible collation after the sort part of the query.

I would like to know if this is a known phenomenon, and what a solution could be.

In the end I'm look for a kana type insensitive, case sensitive sorting, while searching should be kana type insensitive and case insensitive

Here is the test case:

I would assume from the script below, that I get the same results in both queries (the expected sort order is in the third column). Basically once I sort by the complete word, and once I sort manually by the first letter, then the second and then third letter.

Given the DB collation SQL_Latin1_General_CP1_CI_AS

declare  @temp as table  (title nvarchar(5),  expected int,  script varchar(40) )

set nocount on
INSERT INTO @temp values(N'かか7', 4,'hiragana no accent')
INSERT INTO @temp values(N'がが6',7,'hiragana with accent') 
INSERT INTO @temp values(N'いい5',1,'earlier letter hiragana no accent') 
INSERT INTO @temp values(N'カカ4',3, 'katakana no accent') 
INSERT INTO @temp values(N'ガガ3',6, 'katakana with accent') 
INSERT INTO @temp values(N'かか2',2, 'hiragana no accent') 
INSERT INTO @temp values(N'がが1', 5, 'hiragana with accent')

--BAD
select unicode(left(title,1)) 'bin', * from @temp order by title  
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1), substring(title,3,1)

However only the second version works, the first one doesn't sort correctly:

the two result sets

It seems it has to do with the numbers in the title field, since when I remove them, I do get the same order.

declare  @temp as table  (title nvarchar(5),  expected int,  script varchar(40) )

set nocount on
INSERT INTO @temp values(N'かか', 2,'hiragana no accent')
INSERT INTO @temp values(N'がが',3,'hiragana with accent') 
INSERT INTO @temp values(N'いい',1,'earlier letter hiragana no accent') 
INSERT INTO @temp values(N'カカ',2, 'katakana no accent') 
INSERT INTO @temp values(N'ガガ',3, 'katakana with accent') 
INSERT INTO @temp values(N'かか',2, 'hiragana no accent') 
INSERT INTO @temp values(N'がが', 3, 'hiragana with accent')

--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by title  
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1)

See here the results:

correct sort order

Does anybody have a clue why, and possibly a solution?

205

asked Dec 09 '19 13:12

Gideon

1 Answers

Brute-force approach: Checking all supported collations in SQL Server:

create table ##temp(title nvarchar(5),  expected int,  script varchar(40) );

INSERT INTO ##temp values(N'かか7', 4,'hiragana no accent');
INSERT INTO ##temp values(N'がが6',7,'hiragana with accent');
INSERT INTO ##temp values(N'いい5',1,'earlier letter hiragana no accent'); 
INSERT INTO ##temp values(N'カカ4',3, 'katakana no accent');
INSERT INTO ##temp values(N'ガガ3',6, 'katakana with accent'); 
INSERT INTO ##temp values(N'かか2',2, 'hiragana no accent');
INSERT INTO ##temp values(N'がが1', 5, 'hiragana with accent');

And script:

CREATE TABLE result(collation_name NVARCHAR(1000));
DECLARE @collate_name NVARCHAR(1000);
DECLARE @sql NVARCHAR(MAX);

DECLARE c CURSOR FOR
SELECT name FROM sys.fn_helpcollations() /* where name LIKE '%japan%'*/;

OPEN c;
FETCH NEXT FROM c INTO @collate_name;

WHILE @@FETCH_STATUS = 0  
BEGIN  
     SET @sql = REPLACE(
 N'with cte as (
  select bin = unicode(left(title,1)),expected
         ,rn= row_number() over(order by title collate <collate>)
         ,collation = ''<collate>''
   from ##temp 
)
select collation
from cte
where expected = rn GROUP BY collation HAVING COUNT(*) = 7'
     , '<collate>', @collate_name);
     -- debug
     --PRINT @sql;

     INSERT INTO result(collation_name) EXEC (@sql);
     FETCH NEXT FROM c INTO @collate_name;
END 

SELECT * FROM result;

CLOSE c; 
DEALLOCATE c;

db<>fiddle demo

Result: There is no collation in SQL Server 2017 that will match "expected order".

130

answered Oct 12 '22 23:10

Lukasz Szozda

Related questions
                            
                                Can't open lib '/usr/local/lib/libmsodbcsql.17.dylib'
                            
                                CrudRepository existsBy returns a wrong result
                            
                                Make update using join
                            
                                SQL Server: Combine columns abort same values
                            
                                Where to use the column-level encryption of SQL?
                            
                                In Python, Is it possible to connect Azure SQL Server using Active Directory Password Authentication?
                            
                                How to import MSSQL database dump from Plesk into local MS SQL server?
                            
                                Best way to update JSON property with EF core
                            
                                node mssql temp table lost - RequestError: Invalid object name '#myTempTable'
                            
                                Group rows based on the current value starting from the next row
                            
                                IndexOutOfRangeException while trying to select on database by C#
                            
                                Is a non-clustered index implicitly created for each foreign key in a table?
                            
                                How to drop multiple columns in SQL Server
                            
                                Warning: Truncation may occur due to retrieving data from database column
                            
                                sql server for json auto. How to get all of result
                            
                                Index: Avoid duplicates in table when Status = 'S'
                            
                                Why does @@ROWCOUNT return 1 for a NULL statement using sp_executesql?
                            
                                Microsoft.SqlServer.Server namespace
                            
                                Can't connect to docker sql server from NET Core 2.2 Web API
                            
                                custom order by only printing last value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With