I'm trying to understand a piece of code which uses __ctype_b_loc()
, problem is that I don't know what is the purpose of this function.
So far, I found it is defined in the ctype.h
. I also found its prototype and an implementation. Still I have no idea of what this function is for.
Can someone enlight me?
After a fair research, I think I can answer myself this question.
unsigned short int** __ctype_b_loc (void)
is a function which returns a pointer to a 'traits' table containing some flags related with the characteristics of each single character.
Here's the enum with the flags:
ctype.h
enum
{
_ISupper = _ISbit (0), /* UPPERCASE. */
_ISlower = _ISbit (1), /* lowercase. */
_ISalpha = _ISbit (2), /* Alphabetic. */
_ISdigit = _ISbit (3), /* Numeric. */
_ISxdigit = _ISbit (4), /* Hexadecimal numeric. */
_ISspace = _ISbit (5), /* Whitespace. */
_ISprint = _ISbit (6), /* Printing. */
_ISgraph = _ISbit (7), /* Graphical. */
_ISblank = _ISbit (8), /* Blank (usually SPC and TAB). */
_IScntrl = _ISbit (9), /* Control character. */
_ISpunct = _ISbit (10), /* Punctuation. */
_ISalnum = _ISbit (11) /* Alphanumeric. */
};
To make an example, if you make a lookup to the table __ctype_b_loc()
returns for the character whose ascii code is 0x30
('0
') you will have 0x08d8
.
0x08d8=0000 1000 1101 1000 (Alphanumeric, Graphical, Printing, Hexadecimal, Numeric)
The table is connected with the localchar
of the locale installed on the machine, so the example might not be accurate, compared with results you may have on your system.
Alessandro's own answer is very informative, but I would like to add some information.
As stated by Alessandro, the __ctype_b_loc(void)
function returns an array where each element contains the features of one ASCII character. For instance by looking up in the table we can learn that the character 'A' is uppercase, hexadecimal, graphical, printing, alphanumeric.
To be precise, the __ctype_b_loc()
function returns a const unsigned short int**
which is a pointer to an array of 384 unsigned short int*
.
The reason there ara 384 elements is so the table can be indexed by:
unsigned char
value [0,255] (so 256 elements)signed char
value [-128,-1) (so 127 elements)This table is used by the functions :
However these functions are defined as macros, so you will never see them called in an assembly code. What you will see is a call to __ctype_b_loc()
to get the table, some code to retrieve the correct entry and the usage of a bit mask to see if the property we are checking is set. For instance if we want to see if a character is uppercase, we have to check if the bit 0 is set.
Here is the assembly code generated by calling isupper('A');
:
call sym.imp.__ctype_b_loc ; isupper('A');
mov rax, qword [rax] ; get the pointer to the array of 'unsigned short int*'
movsx rdx, byte 0x41 ; prepare to look up for character 'A'
add rdx, rdx ; each entry is 2 bytes, so we double the value of 'A'
add rax, rdx ; look up for 'A' in the table
movzx eax, word [rax] ; get the 'unsigned short int' containing the properties
movzx eax, ax
and eax, 0x100 ; 0x0100 in little-endian is 0x0001 in big-endian (check if bit 0 is set)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With