Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.
Initially I thought that program1 and program2 would give me the same result.
//Program 1
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
//Output: -4
//Program 2
printf("%d", strcmp("abcd", "efgh"));
//Output: -1
Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char *
as the argument of the strcmp()
function.
Why there is a difference between the behaviour of these seemingly same program?
Platform: Linux mint compiler: g++
Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.
This is your C code:
int x1()
{
char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));
}
int x2()
{
printf("%d", strcmp("abcd", "efgh"));
}
And this is the generated assembly output for both functions:
.LC0:
.string "abcd"
.LC1:
.string "efgh"
.LC2:
.string "%d"
x1:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-16], OFFSET FLAT:.LC1
mov rdx, QWORD PTR [rbp-16]
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call strcmp // the strcmp function is actually called
mov esi, eax
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
leave
ret
x2:
push rbp
mov rbp, rsp
mov esi, -1 // strcmp is never called, the compiler
// knows what the result will be and it just
// uses -1
mov edi, OFFSET FLAT:.LC2
mov eax, 0
call printf
nop
pop rbp
ret
When the compiler sees strcmp("abcd", "efgh")
it knows the result beforehand, because it knows that "abcd"
comes before "efgh"
.
But if it sees strcmp(a,b)
it does not know and hence generates code that actually calls strcmp
.
With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.
It is indeed surprising that strcmp
returns 2 different values for these calls, but it is not incompatible with the C Standard:
strcmp()
returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.
As pointed by others, the code generated for the different calls is different:
-1
for the second case where both arguments are string literals.In order to perform this compile time evaluation, strcmp
must be defined in a subtile way in <string.h>
so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.
Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf
to generate the even more compact code puts("-1");
. They seem to convert printf
to puts
only for string literal formats without arguments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With