Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a char's unicode value?

Tags:

rust

I want to get Kanji's Unicode value. It might be something looks like let values: &[u16] = f("ののの");

When I use "の".as_bytes() I got [227, 129, 174].

When I use 'の'.escape_unicode() I got '\u306e', the 0x306e is what exactly I want.

like image 996
AurevoirXavier Avatar asked Oct 21 '18 20:10

AurevoirXavier


1 Answers

The char type can be cast to u32 using as. The line

println!("{:x}", 'の' as u32);

will print "306e" (using {:x} to format the number as hex).

If you are sure all your characters are in the BMP, you can in theory also cast directly to u16. For characters from supplementary planes this will silently give wrong results, though, e.g. '🝖' as u16 returns 0xf756 instead of the correct 0x1f756, so you need a strong reason to do this.

Internally, a char is stored as a 32-bit number, so c as u32 for some character c only reinterprets the memory representation of the character as an u32.

like image 160
Sven Marnach Avatar answered Oct 13 '22 02:10

Sven Marnach