Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird return value in strcmp [duplicate]

Tags:

c

gcc

clang

While checking the return value of strcmp function, I found some strange behavior in gcc. Here's my code:

#include <stdio.h>
#include <string.h>

char str0[] = "hello world!";
char str1[] = "Hello world!";

int main() {
    printf("%d\n", strcmp("hello world!", "Hello world!"));
    printf("%d\n", strcmp(str0, str1));
}

When I compile this with clang, both calls to strcmp return 32. However, when compiling with gcc, the first call returns 1, and the second call returns 32. I don't understand why the first and second calls to strcmp return different values when compiled using gcc.

Below is my test environment.

  • Ubuntu 18.04 64bit
  • gcc 7.3.0
  • clang 6.0.0
like image 264
fips197 Avatar asked Sep 14 '18 14:09

fips197


People also ask

What does strcmp returns when arguments are identical?

Zero(0) − It returns zero if both strings are identical. All characters are same in both strings.

What does strcmp () returns when both the strings are the same?

The return value from strcmp is 0 if the two strings are equal, less than 0 if str1 compares less than str2 , and greater than 0 if str1 compares greater than str2 .

What is the value returned by strcmp?

The strcmp() compares two strings character by character. If the strings are equal, the function returns 0.

Is strcmp insecure?

Using strncmp you can limit the search, so that it doesn't reach non-accessible memory. But, from that, it should not be concluded that strcmp is insecure to use. Both the functions work well in the way they are intended to work.


3 Answers

It looks like you didn't enable optimizations (e.g. -O2).

From my tests it looks like gcc always recognizes strcmp with constant arguments and optimizes it, even with -O0 (no optimizations). Clang needs at least -O1 to do so.

That's where the difference comes from: The code produced by clang calls strcmp twice, but the code produced by gcc just does printf("%d\n", 1) in the first case because it knows that 'h' > 'H' (ASCIIbetically, that is). It's just constant folding, really.

Live example: https://godbolt.org/z/8Hg-gI

As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1. The strcmp library function apparently uses a different value.

like image 100
melpomene Avatar answered Oct 20 '22 22:10

melpomene


The standard defines the result of strcmp to be negative, if lhs appears before rhs in lexical order, zero if they are equal, or a positive value if lhs appears lexically after rhs.

It's up to the implementation how to implement that and what exactly to return. You must not depend on a specific value in your programs, or they won't be portable. Simply check with comparisons (<, >, ==).

See https://en.cppreference.com/w/c/string/byte/strcmp

Background

One simple implementation might just calculate the difference of each character c1 - c2 and do that until the result is not zero, or one of the strings ends. The result will then be the numeric difference between the first character, in which the two strings differed.

For example, this GLibC implementation: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;hb=HEAD

like image 20
Benjamin Maurer Avatar answered Oct 20 '22 23:10

Benjamin Maurer


The strcmp function is only specified to return a value larger than zero, zero, or less than zero. There's nothing specified what those positive and negative values have to be.

like image 36
Some programmer dude Avatar answered Oct 20 '22 22:10

Some programmer dude