I'm learning C, and am currently studying String Handling. From where I'm studying, strcmp()
is defined as-
This is a function which compares two strings to find out whether they are same or different. The two strings are compared character by character until there is a mismatch or end of one of the strings is reached, whichever occurs first. If the two strings are identical, strcmp( ) returns a value zero. If they’re not, it returns the numeric difference between the ASCII values of the first non-matching pairs of characters.
There is a sample program given, which is what my question is about-
main( )
{
char string1[ ] = "Jerry" ;
char string2[ ] = "Ferry" ;
int i, j, k ;
i = strcmp ( string1, "Jerry" ) ;
j = strcmp ( string1, string2 ) ;
k = strcmp ( string1, "Jerry boy" ) ;
printf ( "\n%d %d %d", i, j, k ) ;
}
I ran this program on Dev-C++ on my windows(64 bit) machine, and got this output- 0 1 -1
Now, the book gives the output as 0 4 -32
, with this reasoning-
In the first call to strcmp( ), the two strings are identical—“Jerry” and “Jerry”—and the value returned by strcmp( ) is zero. In the second call, the first character of “Jerry” doesn't match with the first character of “Ferry” and the result is 4, which is the numeric difference between ASCII value of ‘J’ and ASCII value of ‘F’. In the third call to strcmp( ) “Jerry” doesn’t match with “Jerry boy”, because the null character at the end of “Jerry” doesn’t match the blank in “Jerry boy”. The value returned is -32, which is the value of null character minus the ASCII value of space, i.e., ‘\0’ minus ‘ ’, which is equal to -32.
To confirm what the book says, I added this code to my program, just to verify the ASCII difference between J and F:
printf("\n Ascii value of J is %d", 'J' );
printf("\n Ascii value of F is %d", 'F' );
and then I got this in the output accordingly-
Ascii value of J is 74
Ascii value of F is 70
This is according to what the book says, however, as you can see, I get different values of j and k, that is, when the strings don't match. I did look up for similar questions on SO, and got some of them, but could not come across a definite answer for the different output(when it returns 1 and -1
), hence I decided to ask a new question.
This question here seems to be somewhat similar, and the question description contains the following information about strcmp()
-
The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2
In one of the answers, I came across this link which documents the functions of strcmp()
. It further says-
The strcmp() function shall compare the string pointed to by s1 to the string pointed to by s2.
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.
RETURN VALUE
Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.
So, after reading all this, I'm inclined to think that irrespective of the implementation/platform being used, the strcmp()
function should be used only to consider the return value as being to one of three categories (0, positive and negative
), instead of relying on the exact value being returned.
Am I correct in my understanding?
Here is a simple implementation of strcmp()
in C from libc from Apple:
int
strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}
FreeBSD's libc implementation:
int
strcmp(const char *s1, const char *s2)
{
while (*s1 == *s2++)
if (*s1++ == '\0')
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
Here is the implementation from GNU libc, which returns the difference between characters:
int
strcmp (p1, p2)
const char *p1;
const char *p2;
{
const unsigned char *s1 = (const unsigned char *) p1;
const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);
return c1 - c2;
}
That's why most comparisons that I've read are written in < 0
, == 0
and > 0
if it does not need to know the exact difference between the characters in string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With