Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strcmp() and signed / unsigned chars

Tags:

c

standards

I am confused by strcmp(), or rather, how it is defined by the standard. Consider comparing two strings where one contains characters outside the ASCII-7 range (0-127).

The C standard defines:

int strcmp(const char *s1, const char *s2);

The strcmp function compares the string pointed to by s1 to the string pointed to by s2.

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

The parameters are char *. Not unsigned char *. There is no notion that "comparison should be done as unsigned".

But all the standard libraries I checked consider the "high" character to be just that, higher in value than the ASCII-7 characters.

I understand this is useful and the expected behaviour. I don't want to say the existing implementations are wrong or something. I just want to know, which part in the standard specs have I missed?

int strcmp_default( const char * s1, const char * s2 )
{
    while ( ( *s1 ) && ( *s1 == *s2 ) )
    {
        ++s1;
        ++s2;
    }
    return ( *s1 - *s2 );
}

int strcmp_unsigned( const char * s1, const char *s2 )
{
    unsigned char * p1 = (unsigned char *)s1;
    unsigned char * p2 = (unsigned char *)s2;

    while ( ( *p1 ) && ( *p1 == *p2 ) )
    {
        ++p1;
        ++p2;
    }
    return ( *p1 - *p2 );
}

#include <stdio.h>
#include <string.h>

int main()
{
    char x1[] = "abc";
    char x2[] = "abü";
    printf( "%d\n", strcmp_default( x1, x2 ) );
    printf( "%d\n", strcmp_unsigned( x1, x2 ) );
    printf( "%d\n", strcmp( x1, x2 ) );
    return 0;
}

Output is:

103
-153
-153
like image 961
DevSolar Avatar asked Aug 31 '09 09:08

DevSolar


People also ask

Can you use strcmp on chars?

The strcmp() compares two strings character by character. If the strings are equal, the function returns 0.

What is signed unsigned char?

A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short . An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short .

Does strcmp compare null character?

The strcmp() built-in function compares the string pointed to by string1 to the string pointed to by string2 The string arguments to the function must contain a NULL character ( \0 ) marking the end of the string.

What does the strcmp () function do?

strcmp compares two character strings ( str1 and str2 ) using the standard EBCDIC collating sequence. The return value has the same relationship to 0 as str1 has to str2 . If two strings are equal up to the point at which one terminates (that is, contains a null character), the longer string is considered greater.


1 Answers

7.21.4/1 (C99), emphasis is mine:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

There is something similar in C90.

Note that strcoll() may be more adapted than strcmp() especially if you have character outside the basic character set.

like image 83
AProgrammer Avatar answered Nov 16 '22 02:11

AProgrammer