I saw this implementation of strcmp
a while back, and I have a question for purely education purposes. Why is it needed to convert the inputs to 16bit integers, do the math and then convert back to 8bit? What is wrong with doing the subtraction in 8bit?
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( *s1 && (*s1 == *s2) )
{
s1++;
s2++;
}
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}
Note: the code assumes 16 bit int
type.
EDIT:
It was mentioned that C does conversion to int
(suppose 32bit) by default. Is that the case even when the code explicitly states to cast to 16bit int
?
The strcmp(a,b) function is expected to return
<0
if string a < string b
>0
if string a > string b
0
if string a == string b
The test is actually made on the first char being different in the two strings at the same location (0, the string terminator, works as well).
Here since the function takes two uint8_t
(unsigned char), the developer was probably worrying about doing a comparison on two unsigned chars would give a number between 0
and 255
, hence a negative value would never be returned. For instance, 118 - 236
would return -118
, but on 8 bits it would return 138
.
Thus the programmer decided to cast to int_16
, signed integer (16 bits).
That could have worked, and given the correct negative/positive values (provided that the function returns int_16
instead of int_8
).
(*edit: comment from @zwol below, the integer promotion is unavoidable, thus this int16_t
casting is not necessary)
However the final int_8
cast breaks the logic. Since returned values may be from -255
to 255
, some of these values will see their sign reversed after the cast to int_8
.
For instance, doing 255 - 0
gives the positive 255
(on 16 bits, all lower 8 bits to 1, MSB to 0) but in the int_8
world (signed int of 8 bits) this is negative, -1
, since we only have the last low 8 bits set to binary 11111111
, or decimal -1
.
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
(Linux does it in assembly code...)
Actually, the difference must be done in at least 16 bits¹ for the obvious reason that the range of the result is -255 to 255 and that does not fit in 8 bits. However, sfstewman is correct in noting that it would happen due to implicit integer promotion anyway.
The eventual cast to 8 bits is incorrect, because it can overflow as the range still does not fit in 8 bits. And anyway, strcmp
is indeed supposed to return plain int
.
¹ 9 would suffice, but bits normally come in batches of 8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With