How do i get the numeric value of a unicode character in C#?
For example if tamil character அ
(U+0B85) given, output should be 2949
(i.e. 0x0B85
)
Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:
U+0072
U+0327
U+030C
)U+0072
U+0338
U+0327
U+0316
U+0317
U+0300
U+0301
U+0302
U+0308
U+0360
) The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.
The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char
. One character can require 17 char
.
My question was about converting char
into a UTF-16 encoding value. Even if an entire string of 17 char
only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.
e.g.
String s = "அ";
int i = Unicode(s[0]);
Where Unicode
returns the integer value, as defined by the Unicode standard, for the first character of the input expression.
We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.
Holding Ctrl + ⇧ Shift and typing u followed by the hex digits, then releasing Ctrl + ⇧ Shift . Entering Ctrl + ⇧ Shift + u , releasing, then typing the hex digits and pressing ↵ Enter (or Space or even, on some systems, pressing and releasing ⇧ Shift or Ctrl ).
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.
It's basically the same as Java. If you've got it as a char
, you can just convert to int
implicitly:
char c = '\u0b85';
// Implicit conversion: char is basically a 16-bit unsigned integer
int x = c;
Console.WriteLine(x); // Prints 2949
If you've got it as part of a string, just get that single character first:
string text = GetText();
int x = text[2]; // Or whatever...
Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With