Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C strcmp implementation using subtraction of characters

Tags:

c

I saw this implementation of strcmp a while back, and I have a question for purely education purposes. Why is it needed to convert the inputs to 16bit integers, do the math and then convert back to 8bit? What is wrong with doing the subtraction in 8bit?

int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
  while ( *s1 && (*s1 == *s2) )
  {
    s1++; 
    s2++;
  }

  return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}

Note: the code assumes 16 bit int type.

EDIT: It was mentioned that C does conversion to int (suppose 32bit) by default. Is that the case even when the code explicitly states to cast to 16bit int ?

like image 566
madski Avatar asked Jan 18 '16 16:01

madski


2 Answers

The strcmp(a,b) function is expected to return

  • <0 if string a < string b
  • >0 if string a > string b
  • 0 if string a == string b

The test is actually made on the first char being different in the two strings at the same location (0, the string terminator, works as well).

Here since the function takes two uint8_t (unsigned char), the developer was probably worrying about doing a comparison on two unsigned chars would give a number between 0 and 255, hence a negative value would never be returned. For instance, 118 - 236 would return -118, but on 8 bits it would return 138.

Thus the programmer decided to cast to int_16, signed integer (16 bits).

That could have worked, and given the correct negative/positive values (provided that the function returns int_16 instead of int_8).

(*edit: comment from @zwol below, the integer promotion is unavoidable, thus this int16_t casting is not necessary)

However the final int_8 cast breaks the logic. Since returned values may be from -255 to 255, some of these values will see their sign reversed after the cast to int_8.

For instance, doing 255 - 0 gives the positive 255 (on 16 bits, all lower 8 bits to 1, MSB to 0) but in the int_8 world (signed int of 8 bits) this is negative, -1, since we only have the last low 8 bits set to binary 11111111, or decimal -1.


Definitely not a good programming example.

That working function from Apple is better
for ( ; *s1 == *s2; s1++, s2++)
    if (*s1 == '\0')
        return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);

(Linux does it in assembly code...)

like image 82
Déjà vu Avatar answered Nov 13 '22 23:11

Déjà vu


Actually, the difference must be done in at least 16 bits¹ for the obvious reason that the range of the result is -255 to 255 and that does not fit in 8 bits. However, sfstewman is correct in noting that it would happen due to implicit integer promotion anyway.

The eventual cast to 8 bits is incorrect, because it can overflow as the range still does not fit in 8 bits. And anyway, strcmp is indeed supposed to return plain int.


¹ 9 would suffice, but bits normally come in batches of 8.

like image 9
Jan Hudec Avatar answered Nov 13 '22 22:11

Jan Hudec