Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why printf("%s",(char[]){'H','i','\0'}) works as printf("%s","Hi"), but printf("%s",(char*){'H','i','\0'}); fails? [duplicate]

I really need help on this.It has shaken my very foundation in C.Long and detailed answers will be very much appreciated.I have divided my question into two parts.

A: Why does printf("%s",(char[]){'H','i','\0'}); work and prints Hi just as the conventional printf("%s","Hi"); does?Can we use (char[]){'H','i','\0'} as a substitute for "Hi" anywhere in our C code?Do they mean the same?I mean,when we write "Hi" in C,it generally means Hi is stored somewhere in memory and a pointer to it is passed.Can the same be said of the seemingly ugly (char[]){'H','i','\0'}.Are they exactly same?

B: When printf("%s",(char[]){'H','i','\0'}) works successfully,the same as printf("%s","Hi"),why then printf("%s",(char*){'A','B','\0'} fails big time and seg-faults if I run it despite the warnings? It just amazes me, because ,in C, isn't char[] supposed to decompose into char* ,like when we pass it in function arguments,why then it is not doing so here and char* gives failure?I mean, isn't passing char demo[] as an argument to a function same as char demo*?Why then the results are not same here?

Please help me out on this.I feel like I haven't even yet understood the very basics of C.I am very disappointed.Thank you!!

like image 652
Thokchom Avatar asked May 17 '13 17:05

Thokchom


4 Answers

Your third example:

printf("%s",(char *){'H','i','\0'});

isn't even legal (strictly speaking it's a constraint violation), and you should have gotten at least one warning when compiling it. When I compiled it with gcc with default options, I got 6 warnings:

c.c:3:5: warning: initialization makes pointer from integer without a cast [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]
c.c:3:5: warning: excess elements in scalar initializer [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]
c.c:3:5: warning: excess elements in scalar initializer [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]

The second argument to printf is a compound literal. It's legal (but odd) to have a compound literal of type char*, but in this case the initializer-list portion of the compound literal is invalid.

After printing the warnings, what gcc seems to be doing is (a) converting the expression 'H', which is of type int, to char*, yielding a garbage pointer value, and (b) ignoring the remainder of the initializer elements, 'i' and '\0'. The result is a char* pointer value that points to the (probably virtual) address 0x48 -- assuming an ASCII-based character set.

Ignoring excess initializers is valid (but worthy of a warning), but there is no implicit conversion from int to char* (apart from the special case of a null pointer constant, which doesn't apply here). gcc has done its job by issuing a warning, but it could (and IMHO should) have rejected it with a fatal error message. It will do so with the -pedantic-errors option.

If your compiler warned you about those lines, you should have included those warnings in your question. If it didn't, either crank up the warning level or get a better compiler.

Going into more detail about what happens in each of the three cases:

printf("%s","Hi");

A C string literal like "%s" or "Hi" creates an anonymous statically allocated array of char. (This object is not const, but attempting to modify it has undefined behavior; this isn't ideal, but there are historical reasons for it.) A terminating '\0' null character is added to make it a valid string.

An expression of array type, in most contexts (the exceptions are when it's the operand of the unary sizeof or & operator, or when it's a string literal in an initializer used to initialize an array object) is implicitly converted to ("decays to") a pointer to the array's first element. So the two arguments passed to printf are of type char*; printf uses those pointers to traverse the respective arrays.

printf("%s",(char[]){'H','i','\0'});

This uses a feature that was added to the language by C99 (the 1999 edition of the ISO C standard), called a compound literal. It's similar to a string literal, in that it creates an anonymous object and refers to the value of that object. A compound literal has the form:

( type-name ) { initializer-list }

and the object has the specified type and is initialized to the value given by the initializer list.

The above is nearly equivalent to:

char anon[] = {'H', 'i', '\0'};
printf("%s", anon);

Again, the second argument to printf refers to an array object, and it "decays" to a pointer to the array's first element; printf uses that pointer to traverse the array.

Finally, this:

printf("%s",(char*){'A','B','\0'});

as you say, fails big time. The type of a compound literal is usually an array or structure (or union); it actually hadn't occurred to me that it could be a scalar type such as a pointer. The above is nearly equivalent to:

char *anon = {'A', 'B', '\0'};
printf("%s", anon);

Obviously anon is of type char*, which is what printf expects for a "%s" format. But what's the initial value?

The standard requires the initializer for a scalar object to be a single expression, optionally enclosed in curly braces. But for some reason, that requirement is under "Semantics", so violating it is not a constraint violation; it's merely undefined behavior. That means the compiler can do anything it likes, and may or may not issue a diagnostic. The authors of gcc apparently decided to issue a warning and ignore all but the first initializer in the list.

After that, it becomes equivalent to:

char *anon = 'A';
printf("%s", anon);

The constant 'A' is of type int (for historical reasons, it's int rather than char, but the same argument would apply either way). There is no implicit conversion from int to char*, and in fact the above initializer is a constraint violation. That means a compiler must issue a diagnostic (gcc does), and may reject the program (gcc doesn't unless you use -pedantic-errors). Once the diagnostic is issued, the compiler can do whatever it likes; the behavior is undefined (there's some language-lawyerly disagreement on that point, but it doesn't really matter). gcc chooses to convert the value of A from int to char* (probably for historical reasons, going back to when C was even less strongly typed than it is today), resulting in a garbage pointer with a representation that probably looks like 0x00000041 or 0x0000000000000041`.

That garbage pointer is then passed to printf, which tries to use it to access a string at that location in memory. Hilarity ensues.

There are two important things to keep in mind:

  1. If your compiler prints warnings, pay close attention to them. gcc in particular issues warnings for many things that IMHO should be fatal errors. Never ignore warnings unless you understand what the warning means, thoroughly enough for your knowledge to override that of the authors of the compiler.

  2. Arrays and pointers are very different things. Several rules of the C language seemingly conspire to make it look like they're the same. You can temporarily get away with assuming that arrays are nothing more than pointers in disguise, but that assumption will eventually come back to bite you. Read section 6 of the comp.lang.c FAQ; it explains the relationship between arrays and pointers better than I can.

like image 55
Keith Thompson Avatar answered Sep 22 '22 01:09

Keith Thompson


Regarding snippet #2:

The code works because of a new feature in C99, called compound literals. You can read about them in several places, including GCC's documentation, Mike Ash's article, and a bit of google searching.

Essentially, the compiler creates a temporary array on the stack, and fills it with 3 bytes - 0x48, 0x69, and 0x00. That temporary array once created, is then decayed to a pointer and passed to the printf function. A very important thing to note about compound literals is that they are not const by default, like most C-strings.

Regarding snippet #3:

You're actually not creating an array at all - you are casting the first element in the scalar intializer, which, in this case is H, or 0x48 into a pointer. You can see that by changing the %s in your printf statement into a %p, which gives this output for me:

0x48

As such, you must be very careful with what you do with compound literals - they're a powerful tool, but it's easy to shoot yourself in the foot with them.

like image 40
Richard J. Ross III Avatar answered Sep 20 '22 01:09

Richard J. Ross III


(Ok ... someone reworked the question completely. Reworking the answer.)

The #3 array contains the hex bytes. (We don't know about that 4th one):

48 49 00 xx

When it passes the contents of that array, in the 2nd case only, it takes those bytes as the address of the string to print. It depends on how those 4 bytes convert to a pointer in your actual CPU hardware but lets say it says "414200FF" is the address (since we'll guess the 4th byte is an 0xFF. We are making this all up anyway.) We are also assuming a pointer is 4 bytes long and an endian order and stuff like that. It doesn't matter to the answer but others are free to expound.

Note: One of the other answers seems to think it takes the 0x48 and extends it to an (int) 0x00000048 and calls that a pointer. Could be. But if GCC did that, and @KiethThompson didn't say he checked the generated code, it doesn't mean some other C compiler would do the same thing. The result is the same either way.

That gets passed to the printf() function and it tries to go to that address to get some characters to print. (Seg fault happens because that address maybe isn't present on the machine and isn't assigned to your process for reading anyway.)

In case #2 it knows its an array and not a pointer so it passes the address of the memory where the bytes are stored and printf() can do that.

See other answers for more formal language.

One thing to think about is that at least some C compiler probably doesn't know a call to printf from a call to any other function. So it takes the "format string" and stores away a pointer for the call (which happens to be to a string) and then takes the 2nd parameter and stores away whatever it gets according to the declaration of the function, whether an int or a char or a pointer for the call. The function then pulls these out of wherever the caller puts them according to that same declaration. The declaration for the 2nd and greater parameters has to be something really generic to be able to accept pointer, int, double and all the different types that could be there. (What I'm saying is the compiler probably doesn't look at the format string when deciding what to do with the 2nd and following parameters.)

It might be interesting to see what happens for:

printf("%s",{'H','i','\0'});
printf("%s",(char *)(char[]){'H','i','\0'}); // This works according to @DanielFischer

Predictions?

like image 23
Lee Meador Avatar answered Sep 20 '22 01:09

Lee Meador


In each case, the compiler creates an initialized object of type char[3]. In the first case, it treats the object as an array, so it passes a pointer to its first element to the function. In the second case, it treats the object as a pointer, so it passes the value of the object. printf is expecting a pointer, and the value of the object is invalid when treated as a pointer, so the program crashes at runtime.

like image 40
William Pursell Avatar answered Sep 21 '22 01:09

William Pursell