I am confused by strcmp(), or rather, how it is defined by the standard. Consider comparing two strings where one contains characters outside the ASCII-7 range (0-127).
The C standard defines:
int strcmp(const char *s1, const char *s2);
The strcmp function compares the string pointed to by s1 to the string pointed to by s2.
The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
The parameters are char *
. Not unsigned char *
. There is no notion that "comparison should be done as unsigned
".
But all the standard libraries I checked consider the "high" character to be just that, higher in value than the ASCII-7 characters.
I understand this is useful and the expected behaviour. I don't want to say the existing implementations are wrong or something. I just want to know, which part in the standard specs have I missed?
int strcmp_default( const char * s1, const char * s2 )
{
while ( ( *s1 ) && ( *s1 == *s2 ) )
{
++s1;
++s2;
}
return ( *s1 - *s2 );
}
int strcmp_unsigned( const char * s1, const char *s2 )
{
unsigned char * p1 = (unsigned char *)s1;
unsigned char * p2 = (unsigned char *)s2;
while ( ( *p1 ) && ( *p1 == *p2 ) )
{
++p1;
++p2;
}
return ( *p1 - *p2 );
}
#include <stdio.h>
#include <string.h>
int main()
{
char x1[] = "abc";
char x2[] = "abü";
printf( "%d\n", strcmp_default( x1, x2 ) );
printf( "%d\n", strcmp_unsigned( x1, x2 ) );
printf( "%d\n", strcmp( x1, x2 ) );
return 0;
}
Output is:
103
-153
-153
The strcmp() compares two strings character by character. If the strings are equal, the function returns 0.
A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short . An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short .
The strcmp() built-in function compares the string pointed to by string1 to the string pointed to by string2 The string arguments to the function must contain a NULL character ( \0 ) marking the end of the string.
strcmp compares two character strings ( str1 and str2 ) using the standard EBCDIC collating sequence. The return value has the same relationship to 0 as str1 has to str2 . If two strings are equal up to the point at which one terminates (that is, contains a null character), the longer string is considered greater.
7.21.4/1 (C99), emphasis is mine:
The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.
There is something similar in C90.
Note that strcoll() may be more adapted than strcmp() especially if you have character outside the basic character set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With