Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

malloc for struct and pointer in C

Suppose I want to define a structure representing length of the vector and its values as:

struct Vector{
    double* x;
    int n;
};

Now, suppose I want to define a vector y and allocate memory for it.

struct Vector *y = (struct Vector*)malloc(sizeof(struct Vector));

My search over the internet show that I should allocate the memory for x separately.

y->x = (double*)malloc(10*sizeof(double));

But, it seems that I am allocating the memory for y->x twice, one while allocating memory for y and the other while allocating memory for y->x, and it seems a waste of memory. It is very much appreciated if let me know what compiler really do and what would be the right way to initialize both y, and y->x.

Thanks in advance.

like image 930
Pouya Avatar asked Feb 08 '13 08:02

Pouya


3 Answers

No, you're not allocating memory for y->x twice.

Instead, you're allocating memory for the structure (which includes a pointer) plus something for that pointer to point to.

Think of it this way:

         1          2
        +-----+    +------+
y------>|  x------>|  *x  |
        |  n  |    +------+
        +-----+

You actually need the two allocations (1 and 2) to store everything you need.

Additionally, your type should be struct Vector *y since it's a pointer, and you should never cast the return value from malloc in C.

It can hide certain problems you don't want hidden, and C is perfectly capable of implicitly converting the void* return value to any other pointer.

And, of course, you probably want to encapsulate the creation of these vectors to make management of them easier, such as with having the following in a header file vector.h:

struct Vector {
    double *data;    // Use readable names rather than x/n.
    size_t size;
};

struct Vector *newVector(size_t sz);
void delVector(struct Vector *vector);
//void setVectorItem(struct Vector *vector, size_t idx, double val);
//double getVectorItem(struct Vector *vector, size_t idx);

Then, in vector.c, you have the actual functions for managing the vectors:

#include "vector.h"

// Atomically allocate a two-layer object. Either both layers
// are allocated or neither is, simplifying memory checking.

struct Vector *newVector(size_t sz) {
    // First, the vector layer.

    struct Vector *vector = malloc(sizeof (struct Vector));
    if (vector == NULL)
        return NULL;

    // Then data layer, freeing vector layer if fail.

    vector->data = malloc(sz * sizeof (double));
    if (vector->data == NULL) {
        free(vector);
        return NULL;
    }

    // Here, both layers worked. Set size and return.

    vector->size = sz;
    return vector;
}

void delVector(struct Vector *vector) {
    // Can safely assume vector is NULL or fully built.

    if (vector != NULL) {
        free(vector->data);
        free(vector);
    }
}

By encapsulating the vector management like that, you ensure that vectors are either fully built or not built at all - there's no chance of them being half-built.

It also allows you to totally change the underlying data structures in future without affecting clients. For example:

  • if you wanted to make them sparse arrays to trade off space for speed.
  • if you wanted the data saved to persistent storage whenever changed.
  • if you wished to ensure all vector elements were initialised to zero.
  • if you wanted to separate the vector size from the vector capacity for efficiency(1).

You could also add more functionality such as safely setting or getting vector values (see commented code in the header), as the need arises.

For example, you could (as one option) silently ignore setting values outside the valid range and return zero if getting those values. Or you could raise an error of some description, or attempt to automatically expand the vector under the covers(1).


In terms of using the vectors, a simple example is something like the following (very basic) main.c

#include "vector.h"

#include <stdio.h>

int main(void) {
    Vector myvec = newVector(42);
    myvec.data[0] = 2.718281828459;
    delVector(myvec);
}

(1) That potential for an expandable vector bears further explanation.

Many vector implementations separate capacity from size. The former is how many elements you can use before a re-allocation is needed, the latter is the actual vector size (always <= the capacity).

When expanding, you want to generally expand in such a way that you're not doing it a lot, since it can be an expensive operation. For example, you could add 5% more than was strictly necessary so that, in a loop continuously adding one element, it doesn't have to re-allocate for every single item.

like image 159
paxdiablo Avatar answered Oct 18 '22 03:10

paxdiablo


The first time around, you allocate memory for Vector, which means the variables x,n.

However x doesn't yet point to anything useful.

So that is why second allocation is needed as well.

like image 22
Karthik T Avatar answered Oct 18 '22 05:10

Karthik T


In principle you're doing it correct already. For what you want you do need two malloc()s.

Just some comments:

struct Vector y = (struct Vector*)malloc(sizeof(struct Vector));
y->x = (double*)malloc(10*sizeof(double));

should be

struct Vector *y = malloc(sizeof *y); /* Note the pointer */
y->x = calloc(10, sizeof *y->x);

In the first line, you allocate memory for a Vector object. malloc() returns a pointer to the allocated memory, so y must be a Vector pointer. In the second line you allocate memory for an array of 10 doubles.

In C you don't need the explicit casts, and writing sizeof *y instead of sizeof(struct Vector) is better for type safety, and besides, it saves on typing.

You can rearrange your struct and do a single malloc() like so:

struct Vector{    
    int n;
    double x[];
};
struct Vector *y = malloc(sizeof *y + 10 * sizeof(double));
like image 4
Wernsey Avatar answered Oct 18 '22 03:10

Wernsey