Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected printf output [duplicate]

I just discovered very weird behavior from the C compiler. It's very simple code. I tried it in many online C compilers, but the result is always the same, which is driving me insane.

#include <stdio.h>

int main()
{
    char Buffer[10] = "0123456789";
    char ID[5] = "abcde";
    printf("%s",ID);

    return 0;
}

Take your time and try predict the result of the printf function. If you're a human like me, then I think the most obvious solution is "abcde", which is not correct! But if somehow you figured it out "abcde0123456789", then you're consuming electricity to live.

How, just how, is that possible? I'm only selecting the ID array to be printed, so WHY is the Buffer one printed with it too? It doesn't make sense. Even the ID array isn't big enough to fit all that data. I'm really losing my mind here.

like image 496
Vertinhol Avatar asked Mar 20 '26 21:03

Vertinhol


2 Answers

The format specification %s expects a pointer to a string: sequence of characters terminated by the zero character '\0'.

However the both arrays

char Buffer[10] = "0123456789";
char ID[5] = "abcde";

do not contain strings. So the call of printf invokes undefined behavior.

You should write

char Buffer[] = "0123456789";
char ID[] = "abcde";

or

char Buffer[11] = "0123456789";
char ID[6] = "abcde";

Pay attention to that string literals are stored as character arrays with addition zero character '\0'.

For example this declaration

char ID[] = "abcde";

in fact is equivalent to

char ID[] = { 'a', 'b', 'c', 'd', 'e', '\0' };

and this declaration

char ID[5] = "abcde";

is equivalent to

char ID[5] = { 'a', 'b', 'c', 'd', 'e' };

That is in the last case the zero character '\0' is not used as an initializer of the array ID.

If you want to output a character array that does not contain a string you can use the precision field as for example

printf( "%.5s\n", ID );

or

printf( "%.*s\n", 5, ID );

or

printf( "%.*s\n", ( int )sizeof( ID ), ID );

Also bear in mind that opposite to C in C++ such a declaration like

char ID[5] = "abcde";

is invalid. In C++ you may not ignore the terminating zero character '\0' of a string literal used as an initializer. Otherwise the number of initializers will exceed the number of initialized array elements.

like image 136
Vlad from Moscow Avatar answered Mar 22 '26 10:03

Vlad from Moscow


The behavior of printf is undefined because it cannot treat ID as a string, aka a null terminated char array, the printf function, given the %s format specifier, relies on this null terminator to know where to stop printing. Since there is no null byte to be found it will overrun the array into adjacent memory looking for it and prints whatever is in there. It just so happens that in that region is the other char array Buffer, and that's what gets printed, it could be something else entirely including the expected result if by chance a null byte was found in the first byte of this contiguous memory, note the definition of undefined behavior:

Behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment [...]


The majority of the compilers and respective versions I tested indeed behave as you describe and print both arrays in sequence, not all of them though. It's not a pattern you can rely on, as you can see here:

https://godbolt.org/z/1E396Y3KG (gcc with optimization)

Or here:

https://godbolt.org/z/roa6GxWvr (msvc)

The result is not always abcde0123456789.


As for the reason it has no null terminator ('\0'), it's because there is not enough room for it, if you declare the size as having an extra element it will be added automatically by the compiler:

char ID[6] = "abcde"; //will automatically append \0 to the char array
        ^

Omitting the size is actually a better practice, the compiler will deduce the needed size without you having to count the characters and therefore it's less prone to errors:

char ID[] = "abcde";
like image 40
anastaciu Avatar answered Mar 22 '26 11:03

anastaciu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!