While checking the return value of strcmp
function, I found some strange behavior in gcc. Here's my code:
#include <stdio.h>
#include <string.h>
char str0[] = "hello world!";
char str1[] = "Hello world!";
int main() {
printf("%d\n", strcmp("hello world!", "Hello world!"));
printf("%d\n", strcmp(str0, str1));
}
When I compile this with clang, both calls to strcmp
return 32. However, when compiling with gcc, the first call returns 1, and the second call returns 32. I don't understand why the first and second calls to strcmp
return different values when compiled using gcc.
Below is my test environment.
Zero(0) − It returns zero if both strings are identical. All characters are same in both strings.
The return value from strcmp is 0 if the two strings are equal, less than 0 if str1 compares less than str2 , and greater than 0 if str1 compares greater than str2 .
The strcmp() compares two strings character by character. If the strings are equal, the function returns 0.
Using strncmp you can limit the search, so that it doesn't reach non-accessible memory. But, from that, it should not be concluded that strcmp is insecure to use. Both the functions work well in the way they are intended to work.
It looks like you didn't enable optimizations (e.g. -O2
).
From my tests it looks like gcc always recognizes strcmp
with constant arguments and optimizes it, even with -O0
(no optimizations). Clang needs at least -O1
to do so.
That's where the difference comes from: The code produced by clang calls strcmp
twice, but the code produced by gcc just does printf("%d\n", 1)
in the first case because it knows that 'h' > 'H'
(ASCIIbetically, that is). It's just constant folding, really.
Live example: https://godbolt.org/z/8Hg-gI
As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1
. The strcmp
library function apparently uses a different value.
The standard defines the result of strcmp
to be negative, if lhs
appears before rhs
in lexical order, zero if they are equal, or a positive value if lhs
appears lexically after rhs
.
It's up to the implementation how to implement that and what exactly to return. You must not depend on a specific value in your programs, or they won't be portable. Simply check with comparisons (<, >, ==).
See https://en.cppreference.com/w/c/string/byte/strcmp
Background
One simple implementation might just calculate the difference of each character c1 - c2
and do that until the result is not zero, or one of the strings ends. The result will then be the numeric difference between the first character, in which the two strings differed.
For example, this GLibC implementation: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;hb=HEAD
The strcmp
function is only specified to return a value larger than zero, zero, or less than zero. There's nothing specified what those positive and negative values have to be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With