It seems that Japanese sorting "breaks" when the text contains non-japanese text, even when forcing any possible collation after the sort
part of the query.
I would like to know if this is a known phenomenon, and what a solution could be.
In the end I'm look for a kana type insensitive, case sensitive sorting, while searching should be kana type insensitive and case insensitive
Here is the test case:
I would assume from the script below, that I get the same results in both queries (the expected sort order is in the third column). Basically once I sort by the complete word, and once I sort manually by the first letter, then the second and then third letter.
Given the DB collation SQL_Latin1_General_CP1_CI_AS
declare @temp as table (title nvarchar(5), expected int, script varchar(40) )
set nocount on
INSERT INTO @temp values(N'かか7', 4,'hiragana no accent')
INSERT INTO @temp values(N'がが6',7,'hiragana with accent')
INSERT INTO @temp values(N'いい5',1,'earlier letter hiragana no accent')
INSERT INTO @temp values(N'カカ4',3, 'katakana no accent')
INSERT INTO @temp values(N'ガガ3',6, 'katakana with accent')
INSERT INTO @temp values(N'かか2',2, 'hiragana no accent')
INSERT INTO @temp values(N'がが1', 5, 'hiragana with accent')
--BAD
select unicode(left(title,1)) 'bin', * from @temp order by title
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1), substring(title,3,1)
However only the second version works, the first one doesn't sort correctly:
It seems it has to do with the numbers in the title
field, since when I remove them, I do get the same order.
declare @temp as table (title nvarchar(5), expected int, script varchar(40) )
set nocount on
INSERT INTO @temp values(N'かか', 2,'hiragana no accent')
INSERT INTO @temp values(N'がが',3,'hiragana with accent')
INSERT INTO @temp values(N'いい',1,'earlier letter hiragana no accent')
INSERT INTO @temp values(N'カカ',2, 'katakana no accent')
INSERT INTO @temp values(N'ガガ',3, 'katakana with accent')
INSERT INTO @temp values(N'かか',2, 'hiragana no accent')
INSERT INTO @temp values(N'がが', 3, 'hiragana with accent')
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by title
--GOOD
select unicode(left(title,1)) 'bin', * from @temp order by left(title,1),substring(title,2,1)
See here the results:
Does anybody have a clue why, and possibly a solution?
The SQL ORDER BY Keyword The ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword.
If you want to sort based on two columns, separate them by commas. For example, ORDER BY LAST_NAME ASC, FIRST_NAME DESC; would display results sorted alphabetically by last name. If the same LAST_NAME matches multiple FIRST_NAME entries, the results of FIRST_NAME will also display in descending order.
To do so is easy. Just add DESC, The SQL DESC keyword specifies a descending sort order for a column within in the SQL ORDER BY clause. Values are sorted in Z to A order.
Brute-force approach: Checking all supported collations in SQL Server:
create table ##temp(title nvarchar(5), expected int, script varchar(40) );
INSERT INTO ##temp values(N'かか7', 4,'hiragana no accent');
INSERT INTO ##temp values(N'がが6',7,'hiragana with accent');
INSERT INTO ##temp values(N'いい5',1,'earlier letter hiragana no accent');
INSERT INTO ##temp values(N'カカ4',3, 'katakana no accent');
INSERT INTO ##temp values(N'ガガ3',6, 'katakana with accent');
INSERT INTO ##temp values(N'かか2',2, 'hiragana no accent');
INSERT INTO ##temp values(N'がが1', 5, 'hiragana with accent');
And script:
CREATE TABLE result(collation_name NVARCHAR(1000));
DECLARE @collate_name NVARCHAR(1000);
DECLARE @sql NVARCHAR(MAX);
DECLARE c CURSOR FOR
SELECT name FROM sys.fn_helpcollations() /* where name LIKE '%japan%'*/;
OPEN c;
FETCH NEXT FROM c INTO @collate_name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = REPLACE(
N'with cte as (
select bin = unicode(left(title,1)),expected
,rn= row_number() over(order by title collate <collate>)
,collation = ''<collate>''
from ##temp
)
select collation
from cte
where expected = rn GROUP BY collation HAVING COUNT(*) = 7'
, '<collate>', @collate_name);
-- debug
--PRINT @sql;
INSERT INTO result(collation_name) EXEC (@sql);
FETCH NEXT FROM c INTO @collate_name;
END
SELECT * FROM result;
CLOSE c;
DEALLOCATE c;
db<>fiddle demo
Result: There is no collation in SQL Server 2017 that will match "expected order".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With