Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `free` in C not take the number of bytes to be freed?

Just to be clear: I do know that malloc and free are implemented in the C library, which usually allocates chunks of memory from the OS and does its own management to parcel out smaller lots of memory to the application and keeps track of the number of bytes allocated. This question is not How does free know how much to free.

Rather, I want to know why free was made this way in the first place. Being a low-level language, I think it would be perfectly reasonable to ask a C programmer to keep track not only of what memory was allocated but how much (in fact, I commonly find that I end up keeping track of the number of bytes malloced anyway). It also occurs to me that explicitly giving the number of bytes to free might allow for some performance optimisations, e.g. an allocator that has separate pools for different allocation sizes would be able to determine which pool to free from just by looking at the input arguments, and there would be less space overhead overall.

So, in short, why were malloc and free created such that they're required to internally keep track of the number of bytes allocated? Is it just a historical accident?

A small edit: A few people have provided points like "what if you free a different amount than what you allocated". My imagined API could simply require one to free exactly the number of bytes allocated; freeing more or less could simply be UB or implementation defined. I don't want to discourage discussion about other possibilities, though.

like image 304
jaymmer - Reinstate Monica Avatar asked Jun 13 '14 11:06

jaymmer - Reinstate Monica


People also ask

How does free () know how many bytes to free?

How does free() know the size of memory to be deallocated? The free() function is used to deallocate memory while it is allocated using malloc(), calloc() and realloc(). The syntax of the free is simple. We simply use free with the pointer.

What happens when free in C?

free() just declares, to the language implementation or operating system, that the memory is no longer required.

How does malloc know memory to free when a user calls free on a pointer?

When you call malloc() , you specify the amount of memory to allocate. The amount of memory actually used is slightly more than this, and includes extra information that records (at least) how big the block is.

Why do we use free in C?

“free” method in C is used to dynamically de-allocate the memory. The memory allocated using functions malloc() and calloc() is not de-allocated on their own. Hence the free() method is used, whenever the dynamic memory allocation takes place. It helps to reduce wastage of memory by freeing it.


2 Answers

One-argument free(void *) (introduced in Unix V7) has another major advantage over the earlier two-argument mfree(void *, size_t) which I haven't seen mentioned here: one argument free dramatically simplifies every other API that works with heap memory. For example, if free needed the size of the memory block, then strdup would somehow have to return two values (pointer + size) instead of one (pointer), and C makes multiple-value returns much more cumbersome than single-value returns. Instead of char *strdup(char *) we'd have to write char *strdup(char *, size_t *) or else struct CharPWithSize { char *val; size_t size}; CharPWithSize strdup(char *). (Nowadays that second option looks pretty tempting, because we know that NUL-terminated strings are the "most catastrophic design bug in the history of computing", but that's hindsight speaking. Back in the 70's, C's ability to handle strings as a simple char * was actually considered a defining advantage over competitors like Pascal and Algol.) Plus, it isn't just strdup that suffers from this problem -- it affects every system- or user-defined function which allocates heap memory.

The early Unix designers were very clever people, and there are many reasons why free is better than mfree so basically I think the answer to the question is that they noticed this and designed their system accordingly. I doubt you'll find any direct record of what was going on inside their heads at the moment they made that decision. But we can imagine.

Pretend that you're writing applications in C to run on V6 Unix, with its two-argument mfree. You've managed okay so far, but keeping track of these pointer sizes is becoming more and more of a hassle as your programs become more ambitious and require more and more use of heap allocated variables. But then you have a brilliant idea: instead of copying around these size_ts all the time, you can just write some utility functions, which stash the size directly inside the allocated memory:

void *my_alloc(size_t size) {     void *block = malloc(sizeof(size) + size);     *(size_t *)block = size;     return (void *) ((size_t *)block + 1); } void my_free(void *block) {     block = (size_t *)block - 1;     mfree(block, *(size_t *)block); } 

And the more code you write using these new functions, the more awesome they seem. Not only do they make your code easier to write, they also make your code faster -- two things which don't often go together! Before you were passing these size_ts around all over the place, which added CPU overhead for the copying, and meant you had to spill registers more often (esp. for the extra function arguments), and wasted memory (since nested function calls will often result in multiple copies of the size_t being stored in different stack frames). In your new system, you still have to spend the memory to store the size_t, but only once, and it never gets copied anywhere. These may seem like small efficiencies, but keep in mind that we're talking about high-end machines with 256 KiB of RAM.

This makes you happy! So you share your cool trick with the bearded men who are working on the next Unix release, but it doesn't make them happy, it makes them sad. You see, they were just in the process of adding a bunch of new utility functions like strdup, and they realize that people using your cool trick won't be able to use their new functions, because their new functions all use the cumbersome pointer+size API. And then that makes you sad too, because you realize you'll have to rewrite the good strdup(char *) function yourself in every program you write, instead of being able to use the system version.

But wait! This is 1977, and backwards compatibility won't be invented for another 5 years! And besides, no-one serious actually uses this obscure "Unix" thing with its off-color name. The first edition of K&R is on its way to the publisher now, but that's no problem -- it says right on the first page that "C provides no operations to deal directly with composite objects such as character strings... there is no heap...". At this point in history, string.h and malloc are vendor extensions (!). So, suggests Bearded Man #1, we can change them however we like; why don't we just declare your tricky allocator to be the official allocator?

A few days later, Bearded Man #2 sees the new API and says hey, wait, this is better than before, but it's still spending an entire word per allocation storing the size. He views this as the next thing to blasphemy. Everyone else looks at him like he's crazy, because what else can you do? That night he stays late and invents a new allocator that doesn't store the size at all, but instead infers it on the fly by performing black magic bitshifts on the pointer value, and swaps it in while keeping the new API in place. The new API means that no-one notices the switch, but they do notice that the next morning the compiler uses 10% less RAM.

And now everyone's happy: You get your easier-to-write and faster code, Bearded Man #1 gets to write a nice simple strdup that people will actually use, and Bearded Man #2 -- confident that he's earned his keep for a bit -- goes back to messing around with quines. Ship it!

Or at least, that's how it could have happened.

like image 93
Nathaniel J. Smith Avatar answered Sep 19 '22 05:09

Nathaniel J. Smith


"Why does free in C not take the number of bytes to be freed?"

Because there's no need for it, and it wouldn't quite make sense anyway.

When you allocate something, you want to tell the system how many bytes to allocate (for obvious reasons).

However, when you have already allocated your object, the size of the memory region you get back is now determined. It's implicit. It's one contiguous block of memory. You can't deallocate part of it (let's forget realloc(), that's not what it's doing anyway), you can only deallocate the entire thing. You can't "deallocate X bytes" either -- you either free the memory block you got from malloc() or you don't.

And now, if you want to free it, you can just tell the memory manager system: "here's this pointer, free() the block it is pointing to." - and the memory manager will know how to do that, either because it implicitly knows the size, or because it might not even need the size.

For example, most typical implementations of malloc() maintain a linked list of pointers to free and allocated memory blocks. If you pass a pointer to free(), it will just search for that pointer in the "allocated" list, un-link the corresponding node and attach it to the "free" list. It didn't even need the region size. It will only need that information when it potentially attempts to re-use the block in question.

like image 39
The Paramagnetic Croissant Avatar answered Sep 21 '22 05:09

The Paramagnetic Croissant