Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ambiguous behaviour of strcmp()

Tags:

c

string

strcmp

Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.

Initially I thought that program1 and program2 would give me the same result.

//Program 1

char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));


//Output: -4

//Program 2
printf("%d", strcmp("abcd", "efgh"));

//Output: -1

Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char * as the argument of the strcmp() function.

Why there is a difference between the behaviour of these seemingly same program?

Platform: Linux mint compiler: g++

Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.

like image 673
u_sre Avatar asked Sep 17 '25 18:09

u_sre


2 Answers

This is your C code:

int x1()
{
  char *a = "abcd";
  char *b = "efgh";
  printf("%d", strcmp(a,b));
}

int x2()
{
  printf("%d", strcmp("abcd", "efgh"));
}

And this is the generated assembly output for both functions:

.LC0:
        .string "abcd"
.LC1:
        .string "efgh"
.LC2:
        .string "%d"
x1:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], OFFSET FLAT:.LC0
        mov     QWORD PTR [rbp-16], OFFSET FLAT:.LC1
        mov     rdx, QWORD PTR [rbp-16]
        mov     rax, QWORD PTR [rbp-8]
        mov     rsi, rdx
        mov     rdi, rax
        call    strcmp              // the strcmp function is actually called
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        nop
        leave
        ret

x2:
        push    rbp
        mov     rbp, rsp
        mov     esi, -1             // strcmp is never called, the compiler
                                    // knows what the result will be and it just
                                    // uses -1
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        nop
        pop     rbp
        ret

When the compiler sees strcmp("abcd", "efgh") it knows the result beforehand, because it knows that "abcd" comes before "efgh".

But if it sees strcmp(a,b) it does not know and hence generates code that actually calls strcmp.

With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.

like image 175
Jabberwocky Avatar answered Sep 19 '25 07:09

Jabberwocky


It is indeed surprising that strcmp returns 2 different values for these calls, but it is not incompatible with the C Standard:

strcmp() returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.

As pointed by others, the code generated for the different calls is different:

  • the compiler generates a call to the library function in the first program
  • the compiler is able to determine the result of the comparison and generates an explicit result of -1 for the second case where both arguments are string literals.

In order to perform this compile time evaluation, strcmp must be defined in a subtile way in <string.h> so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.

Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf to generate the even more compact code puts("-1");. They seem to convert printf to puts only for string literal formats without arguments.

like image 45
chqrlie Avatar answered Sep 19 '25 06:09

chqrlie