I came across this line of code today:
int c = (int)'c';
I was not aware you could cast a char
to an int
. So I tested it out, and found that a=97, b=98, c=99, d=100 etc etc...
Why is 'a' 97? What do those numbers relate to?
Everyone else (so far) has referred to ASCII. That's a very limited view - it works for 'a'
, but doesn't work for anything with an accent etc - which can very easily be represented by char
.
A char
is just an unsigned 16-bit integer, which is a UTF-16 code unit. Usually that's equivalent to a Unicode character, but not always - sometimes multiple code units are required for a single full character. See the documentation for System.Char
for more details.
The implicit conversion from char
to int
(you don't need the cast in your code) just converts that 16-bit unsigned integer to a 32-bit signed integer in the natural, non-lossy way - just as if you had a ushort
.
Note that every valid character in ASCII has the same value in UTF-16, which is why the two are often confused when the examples are only ones from the ASCII set.
97
is UTF-16 code unit value of letter a
.
Basically this number relates to UTF-16 code unit of given character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With