Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode alphanumeric character range

Tags:

unicode

I'm looking at the IsCharAlphaNumeric Windows API function. As it only takes a single TCHAR, it obviously can't make any decisions about surrogate pairs for UTF16 content. Does that mean that there are no alphanumeric characters that are surrogate pairs?

like image 422
Puppy Avatar asked Oct 31 '25 00:10

Puppy


1 Answers

Characters outside the BMP can be letters. (Michael Kaplan recently discussed a bug in the classification of the character U+1F48C.) But IsCharAlphaNumeric cannot see characters outside the BMP (for the reasons you noted), so you cannot obtain classification information for them that way.

If you have a surrogate pair, call GetStringType with cchSrc = 2 and check for C1_ALPHA and C1_DIGIT.

Edit: The second half of this answer is incorrect GetStringType does not support surrogate pairs.

like image 80
Raymond Chen Avatar answered Nov 03 '25 21:11

Raymond Chen