Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to decode hex sequence of unicode characters to string

Tags:

c#

unicode

decode

I'm working with C# .Net

I would like to know how to convert a Unicode form string like "\u1D0EC" (note that it's above "\uFFFF") to it's symbol... "𝃬"

Thanks For Advance!!!

like image 676
Jack Avatar asked Jan 02 '10 19:01

Jack


2 Answers

That Unicode codepoint is encoded in UTF32. .NET and Windows encode Unicode in UTF16, you'll have to translate. UTF16 uses "surrogate pairs" to handle codepoints above 0xffff, a similar kind of approach as UTF8. The first code of the pair is 0xd800..dbff, the second code is 0xdc00..dfff. Try this sample code to see that at work:

using System;
using System.Text;

class Program {
  static void Main(string[] args) {
    uint utf32 = uint.Parse("1D0EC", System.Globalization.NumberStyles.HexNumber);
    string s = Encoding.UTF32.GetString(BitConverter.GetBytes(utf32));
    foreach (char c in s.ToCharArray()) {
      Console.WriteLine("{0:X}", (uint)c);
    }
    Console.ReadLine();
  }
}
like image 166
Hans Passant Avatar answered Sep 21 '22 13:09

Hans Passant


Convert each sequence with int.Parse(String, NumberStyles) and char.ConvertFromUtf32:

string s = @"\U1D0EC";
string converted = char.ConvertFromUtf32(int.Parse(s.Substring(2), NumberStyles.HexNumber));
like image 37
bruno conde Avatar answered Sep 22 '22 13:09

bruno conde