Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chinese collation for MS SQL

I found in our production DB for Asia market weird behavior. Where condition is not working as one would expect in case of Chinese letters.

create table #Temp  (TextContent nvarchar(20), ChineseType varchar(10))

insert #Temp values (N'㱔', '??')  --odd
insert #Temp values (N'𨿅', '??')  --odd
insert #Temp values (N'龪', '??')  --odd
insert #Temp values (N'㕦', 'prc') --odd
insert #Temp values (N'谷', 'prc')
insert #Temp values (N'丑', 'prc')
insert #Temp values (N'苹', 'prc')
insert #Temp values (N'松', 'prc')
insert #Temp values (N'穀', 'taiwan')
insert #Temp values (N'醜', 'taiwan')
insert #Temp values (N'蘋', 'taiwan')
insert #Temp values (N'鬆', 'taiwan')
insert #Temp values (N'隻', 'taiwan')

select * from #Temp where TextContent like ''
select * from #Temp where TextContent like N'𨿅'
select * from #Temp where TextContent like N'㕦'
-- all will return
-- |TextContent | ChineseType |
-- |     㱔      |     ??      |
-- |     𨿅      |     ??      |
-- |     龪      |     ??      |
-- |     㕦      |     prc     |

First I found that default collation is SQL_Latin1_General_CP1_CI_AS therefore I google some theory about Chinese alphabet, sorting, collation and then I tried Chinese_PRC_CI_AS, Chinese_PRC_CI_AI, Chinese_PRC_CI_AS_KS_WS, Chinese_PRC_CS_AS_KS_WS but without success. Always returning same results.

select * from #Temp where TextContent like N'㕦' COLLATE Chinese_PRC_CI_AS
select * from #Temp where TextContent like N'㕦' COLLATE Chinese_PRC_CI_AI
-- all will return
-- |TextContent | ChineseType |
-- |     㱔      |     ??      |
-- |     𨿅      |     ??      |
-- |     龪      |     ??      |
-- |     㕦      |     prc     |

The only 'working as expected' is binary collation eg. Chinese_PRC_BIN, Chinese_PRC_BIN2, Latin1_General_BIN.

  • Have someone explanation why is not working Chinese_PRC_CI_AS?
  • What is Chinese_PRC_BIN sorting order type? Is it Chinese radical (strokes)?

Thanks

like image 439
Jaroslav Kubacek Avatar asked Feb 15 '17 15:02

Jaroslav Kubacek


1 Answers

After some investigation I found standard GB 18030 which defines the required language and character support necessary for software in China. And I found that Microsoft's System Center Configuration Manager is compatible with this standard only in case that db collation is set to one of following:

  • Chinese_Simplified_Pinyin_100_CI_AI
  • Chinese_Simplified_Stroke_Order_100_CI_AI

Our customer prefer stroke order so I tried Chinese_Simplified_Stroke_Order_100_CI_AI and it works fine.

Even if it is not answer why Chinese_PRC_CI_AS does not work or more details about Chinese_PRC_BIN sort it's applicable solution.

like image 120
Jaroslav Kubacek Avatar answered Sep 30 '22 08:09

Jaroslav Kubacek