Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize unicode string in SQL Server?

Is there a function in SQL Server to normalize a unicode string? e.g.

UPDATE Orders SET Notes = NormalizeString(Notes, 'FormC')

Unicode Normalization Forms:

  • C​omposition (C): A + ¨ becomes Ä
  • D​ecomposition (D): Ä becomes A + ¨
  • Compatible Composition (KC): A + ¨ + + n becomes Ä + f + i + n
  • Compatible Decomposition (KD): Ä + + n becomes A + ¨ + f + i + n

i cannot find any built-in function, so i assume there is none.


Ideally, if there can be only one, then i happen to need Form C today:

Unicode normalization form C, canonical composition. Transforms each decomposed grouping, consisting of a base character plus combining characters, to the canonical precomposed equivalent. For example, A + ¨ becomes Ä.

See also

  • Unicode Normalization in Windows
  • How do I remove diacritics (accents) from a string in .NET?
  • NormalizeString function
  • Sorting it all out: What normalization form does SQL Server use
like image 501
Ian Boyd Avatar asked Oct 18 '11 18:10

Ian Boyd


1 Answers

Sorry, no, there is no such function in any version of SQL Server to date (2012 test builds). Comparisons can be correctly composition-insensitive, but there isn't a function to convert character composition usage to one normal form.

It has been suggested for a future version of the ANSI standard under the syntax NORMALIZE(string, NFC) but it's going to be a long time before this makes it to the real world. For now if you want to do normalisation you'll have to do it in a proper programming language with better string-handling capabilities, either by pulling the string out of the database or by writing a CLR stored procedure to do it.

like image 66
bobince Avatar answered Sep 22 '22 20:09

bobince