Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i get the decimal value of a unicode character in C#?

How do i get the numeric value of a unicode character in C#?

For example if tamil character (U+0B85) given, output should be 2949 (i.e. 0x0B85)

See also

  • C++: How to get decimal value of a unicode character in c++
  • Java: How can I get a Unicode character's code?

Multi code-point characters

Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:

  • enter image description here (i.e. U+0072 U+0327 U+030C)
  • enter image description here (i.e. U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)

The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.

The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char. One character can require 17 char.

My question was about converting char into a UTF-16 encoding value. Even if an entire string of 17 char only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.

e.g.

String s = "அ";

int i = Unicode(s[0]);

Where Unicode returns the integer value, as defined by the Unicode standard, for the first character of the input expression.

like image 282
Ian Boyd Avatar asked Oct 19 '11 18:10

Ian Boyd


People also ask

How do I find Unicode value of a character?

We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.

How do I type Unicode decimals?

Holding Ctrl + ⇧ Shift and typing u followed by the hex digits, then releasing Ctrl + ⇧ Shift . Entering Ctrl + ⇧ Shift + u , releasing, then typing the hex digits and pressing ↵ Enter (or Space or even, on some systems, pressing and releasing ⇧ Shift or Ctrl ).

How does C handle Unicode?

It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.


1 Answers

It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:

char c = '\u0b85';

// Implicit conversion: char is basically a 16-bit unsigned integer
int x = c;
Console.WriteLine(x); // Prints 2949

If you've got it as part of a string, just get that single character first:

string text = GetText();
int x = text[2]; // Or whatever...

Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.

like image 163
Jon Skeet Avatar answered Nov 02 '22 06:11

Jon Skeet