Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimized strcmp implementation

This function was found here. It's an implementation of strcmp:

int strcmp(const char* s1, const char* s2)
{
    while (*s1 && (*s1 == *s2))
        s1++, s2++;
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
}

I understand all but the last line, in short what is going on in the last line?

like image 406
Cody Smith Avatar asked Nov 15 '13 15:11

Cody Smith


Video Answer


2 Answers

return *(const unsigned char*)s1-*(const unsigned char*)s2;

OP: in short what is going on in the last line?

A: The first potential string difference is compared. Both chars are referenced as unsigned char as required by the spec. The 2 are promoted to int and the difference is returned.


Notes:

1 The return value's sign (<0, 0, >0) is the most meaningful part. It is the only part that is specified by the C spec.

2 On some systems char is signed (more common). On others, char is unsigned. Defining the "sign-ness" of the last comparison promotes portability. Note that fgetc() obtains characters as unsigned char.

3 Other than that a string ends with a \0, the character encoding employed (like ASCII - most common), makes no difference at the binary level. If the first chars that differ in 2 strings have values 65 and 97, the first string will be less than the second, even if the character encoding is non-ASCII. OTOH, strcmp("A", "a") will return a negative number when character encoding is ASCII, but may return a positive number in a different character encoding for their underlying value and order are not defined by C.

like image 116
chux - Reinstate Monica Avatar answered Sep 28 '22 19:09

chux - Reinstate Monica


This implementation can be further optimized, shaving off some comparisons:

int strcmp(const char *s1, const char *s2) {
    unsigned char c1, c2;
    while ((c1 = *s1++) == (c2 = *s2++)) {
        if (c1 == '\0')
            return 0;
    }
    return c1 - c2;
}

The return value is 0 if the string are identical up to and including the terminating null byte. The sign of the return value is that of the difference between the first differing characters, converted to unsigned char as per the C Standard.

  • If char is smaller than int, which is true on all but some rare embedded systems, this difference can be computed with a simple subtraction, both c1 and c2 being promoted to int and this difference is guaranteed to fit in the range of type int.

  • On systems where sizeof(int) == 1, the return value should be computed this way:

    return (c1 < c2) ? -1 : 1;
    
like image 21
chqrlie Avatar answered Sep 28 '22 21:09

chqrlie