Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning struct containing array

Tags:

The following simple code segfaults under gcc 4.4.4

#include<stdio.h>  typedef struct Foo Foo; struct Foo {     char f[25]; };  Foo foo(){     Foo f = {"Hello, World!"};     return f; }  int main(){     printf("%s\n", foo().f); } 

Changing the final line to

 Foo f = foo(); printf("%s\n", f.f); 

Works fine. Both versions work when compiled with -std=c99. Am I simply invoking undefined behavior, or has something in the standard changed, which permits the code to work under C99? Why does is crash under C89?

like image 743
Dave Avatar asked Jan 08 '12 01:01

Dave


People also ask

How do you return a struct array?

You can't return arrays from functions — period. You can return pointers though, provided the storage will continue to exist after the function returns. Or you can pass a pointer to the function pointing to the storage that the function should use. Don't forget to pass the size of the array too.

Can structs be returned?

You can return a structure from a function (or use the = operator) without any problems. It's a well-defined part of the language. The only problem with struct b = a is that you didn't provide a complete type. struct MyObj b = a will work just fine.

Can you return a struct in C?

Now, functions in C can return the struct similar to the built-in data types. In the following example code, we implemented a clearMyStruct function that takes a pointer to the MyStruct object and returns the same object by value.

How do you return an array from a function?

There are three right ways of returning an array to a function: Using dynamically allocated array. Using static array. Using structure.


2 Answers

I believe the behavior is undefined both in C89/C90 and in C99.

foo().f is an expression of array type, specifically char[25]. C99 6.3.2.1p3 says:

Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

The problem in this particular case (an array that's an element of a structure returned by a function) is that there is no "array object". Function results are returned by value, so the result of calling foo() is a value of type struct Foo, and foo().f is a value (not an lvalue) of type char[25].

This is, as far as I know, the only case in C (up to C99) where you can have a non-lvalue expression of array type. I'd say that the behavior of attempting to access it is undefined by omission, likely because the authors of the standard (understandably IMHO) didn't think of this case. You're likely to see different behaviors at different optimization settings.

The new 2011 C standard patches this corner case by inventing a new storage class. N1570 (the link is to a late pre-C11 draft) says in 6.2.4p8:

A non-lvalue expression with structure or union type, where the structure or union contains a member with array type (including, recursively, members of all contained structures and unions) refers to an object with automatic storage duration and temporary lifetime. Its lifetime begins when the expression is evaluated and its initial value is the value of the expression. Its lifetime ends when the evaluation of the containing full expression or full declarator ends. Any attempt to modify an object with temporary lifetime results in undefined behavior.

So the program's behavior is well defined in C11. Until you're able to get a C11-conforming compiler, though, your best bet is probably to store the result of the function in a local object (assuming your goal is working code rather than breaking compilers):

[...] int main(void ) {     struct Foo temp = foo();     printf("%s\n", temp.f); } 
like image 153
Keith Thompson Avatar answered Sep 23 '22 08:09

Keith Thompson


printf is a bit funny, because it's one of those functions that takes varargs. So let's break it down by writing a helper function bar. We'll return to printf later.

(I'm using "gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3")

void bar(const char *t) {     printf("bar: %s\n", t); } 

and calling that instead:

bar(foo().f); // error: invalid use of non-lvalue array 

OK, that gives an error. In C and C++, you are not allowed to pass an array by value. You can work around this limitation by putting the array inside a struct, for example void bar2(Foo f) {...}

But we're not using that workaround - we're not allowed to pass in the array by value. Now, you might think it should decay to a char*, allowing you to pass the array by reference. But decay only works if the array has an address (i.e. is an lvalue). But temporaries, such as the return values from function, live in a magic land where they don't have an address. Therefore you can't take the address & of a temporary. In short, we're not allowed to take the address of a temporary, and hence it can't decay to a pointer. We are unable to pass it by value (because it's an array), nor by reference (because it's a temporary).

I found that the following code worked:

bar(&(foo().f[0])); 

but to be honest I think that's suspect. Hasn't this broken the rules I just listed?

And just to be complete, this works perfectly as it should:

Foo f = foo(); bar(f.f); 

The variable f is not a temporary and hence we can (implicitly, during decay) takes its address.

printf, 32-bit versus 64-bit, and weirdness

I promised to mention printf again. According to the above, it should refuse to pass foo().f to any function (including printf). But printf is funny because it's one of those vararg functions. gcc allowed itself to pass the array by value to the printf.

When I first compiled and ran the code, it was in 64-bit mode. I didn't see confirmation of my theory until I compiled in 32-bit (-m32 to gcc). Sure enough I got a segfault, as in the original question. (I had been getting some gibberish output, but no segfault, when in 64 bits).

I implemented my own my_printf (with the vararg nonsense) which printed the actual value of the char * before trying to print the letters pointed at by the char*. I called it like so:

my_printf("%s\n", f.f); my_printf("%s\n", foo().f); 

and this is the output I got (code on ideone):

arg = 0xffc14eb3        // my_printf("%s\n", f.f); // worked fine string = Hello, World! arg = 0x6c6c6548        // my_printf("%s\n", foo().f); // it's about to crash! Segmentation fault 

The first pointer value 0xffc14eb3 is correct (it points to the characters "Hello, world!"), but look at the second 0x6c6c6548. That's the ASCII codes for Hell (reverse order - little endianness or something like that). It has copied the array by value into printf and the first four bytes have been interpreted as a 32-bit pointer or integer. This pointer doesn't point anywhere sensible and hence the program crashes when it attempts to access that location.

I think this is in violation of the standard, simply by virtue of the fact that we're not supposed to be allowed to copy arrays by value.

like image 37
Aaron McDaid Avatar answered Sep 21 '22 08:09

Aaron McDaid