Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I "over-extend" an array by allocating more space to the enclosing struct?

Frankly, is such a code valid or does it produce UB?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct __attribute__((__packed__)) weird_struct
{
    int some;
    unsigned char value[1];
};

int main(void)
{
    unsigned char text[] = "Allie has a cat";
    struct weird_struct *ws =
        malloc(sizeof(struct weird_struct) + sizeof(text) - 1);
    ws->some = 5;
    strcpy(ws->value, text);
    printf("some = %d, value = %s\n", ws->some, ws->value);
    free(ws);
    return 0;
}

http://ideone.com/lpByQD

I’d never think it is valid to something like this, but it would seem that SystemV message queues do exactly that: see the man page.

So, if SysV msg queues can do that, perhaps I can do this too? I think I’d find this useful to send data over the network (hence the __attribute__((__packed__))).

Or, perhaps this is a specific guarantee of SysV msg queues and I shouldn’t do something like that elsewhere? Or, perhaps this technique can be employed, only I do it wrongly? I figured out I’d better ask.

This - 1 in malloc(sizeof(struct weird_struct) + sizeof(text) - 1) is because I take into account that one byte is allocated anyway thanks to unsigned char value[1] so I can subtract it from sizeof(text).


2 Answers

The standard C way (since C99) to do this would be using flexible array member. The last member of the structure needs to be incomplete array type and you can allocate required amount of memory at runtime.

Something like

struct __attribute__((__packed__)) weird_struct
{
    int some;
    unsigned char value [ ];   //nothing, no 0, no 1, no nothing...
}; 

and later

struct weird_struct *ws =
    malloc(sizeof(struct weird_struct) + strlen("this to be copied") + 1);

or

struct weird_struct *ws =
    malloc(sizeof(struct weird_struct) + sizeof("this to be copied"));

will do the job.

Related, quoting the C11 standard, chapter §6.7.2.1

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.


Related to the one-element array usage, from online gcc manual page for zero-length array support option

struct line {
  int length;
  char contents[0];
};

struct line *thisline = (struct line *)
  malloc (sizeof (struct line) + this_length);
thisline->length = this_length;

In ISO C90, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc.

which also answers the -1 part in the malloc() argument, as sizeof(char) is guaranteed to be 1 in C.

like image 197
Sourav Ghosh Avatar answered Sep 15 '25 04:09

Sourav Ghosh


The Standard allows implementations to act in any way they see fit if code accesses an array object beyond its stated bounds, even if the code owns the storage that would be accessed thereby. So far as I can tell, this rule is intended to allow for a compiler given something like:

struct s1 { char arr[4]; char y; } *p;
int x;
...
p->y = 1;
p->arr[x] = 2;
return p->y;

to treat it as equivalent to:

struct s1 { char arr[4]; char y; } *p;
int x;
...
p->arr[x] = 2;
p->y = 1;
return 1;

avoiding an extra load step, without having to pessimistically allow for the possibility that x might equal 4. Quality compilers should be able to recognize certain constructs which involve accessing arrays beyond their stated bounds (e.g. those involving a pointer to a structure with a single-element array as its last element) and handle them sensibly, but nothing in the Standard would require that they do so, and some compiler writers take the attitude that permission for compilers to behave in nonsensical fashion should be interpreted as an invitation to do so. I think that behavior would be defined, even for the x==4 case (meaning the compiler would have to allow for the possibility of it modifying y), if the array write were handled via something like: (char*)(struct s1*)(p->arr)[x] = 2; but the Standard is not really clear on whether the cast to struct s1* is necessary.

like image 38
supercat Avatar answered Sep 15 '25 05:09

supercat