How do i get the decimal value of a unicode character in C#?

How do i get the numeric value of a unicode character in C#?

For example if tamil character அ (U+0B85) given, output should be 2949 (i.e. 0x0B85)

Multi code-point characters

Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:

(i.e. U+0072 U+0327 U+030C)
(i.e. U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)

The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.

The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char. One character can require 17 char.

My question was about converting char into a UTF-16 encoding value. Even if an entire string of 17 char only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.

e.g.

String s = "அ";

int i = Unicode(s[0]);

Where Unicode returns the integer value, as defined by the Unicode standard, for the first character of the input expression.

How do I find Unicode value of a character?

We can determine the unicode category for a particular character by using the getType() method. It is a static method of Character class and it returns an integer value of char ch representing in unicode general category.

How do I type Unicode decimals?

Holding Ctrl + ⇧ Shift and typing u followed by the hex digits, then releasing Ctrl + ⇧ Shift . Entering Ctrl + ⇧ Shift + u , releasing, then typing the hex digits and pressing ↵ Enter (or Space or even, on some systems, pressing and releasing ⇧ Shift or Ctrl ).

How does C handle Unicode?

It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.

It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:

char c = '\u0b85';

// Implicit conversion: char is basically a 16-bit unsigned integer
int x = c;
Console.WriteLine(x); // Prints 2949

If you've got it as part of a string, just get that single character first:

string text = GetText();
int x = text[2]; // Or whatever...

Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.

How do i get the decimal value of a unicode character in C#?

Tags:

string

c#

unicode

localization

See also

Multi code-point characters

Ian Boyd

People also ask

1 Answers

Jon Skeet

Recent Activity

Donate For Us

How do i get the decimal value of a unicode character in C#?

Tags:

string

c#

unicode

localization

See also

Multi code-point characters

Ian Boyd

People also ask

1 Answers

Jon Skeet

Related questions

Recent Activity

Donate For Us