Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the next step to improve malloc() algorithm? [closed]

I'm writing my own simple malloc() function and I would like to create more faster and efficient variant. I'm writed function that use linear search and allocated sequentially and contiguously in memory.

What is the next step to improve this algorithm? What are the main shortcomings of my current version? I would be very grateful for any feedback and recommendation.

typedef struct heap_block
{  
    struct heap_block* next;
    size_t size;
    bool isfree;
}header;

#define Heap_Capacity 100000
static char heap[Heap_Capacity];
size_t heap_size;

void* malloc(size_t sz) 
{
    if(sz == 0 || sz > Heap_Capacity) { return NULL; }

    header* block = (header*)heap;
    if(heap_size == 0)
    {
        set_data_to_block(block, sz);
        return (void*)(block+1);
    }

    while(block->next != NULL) { block = block->next; }

    block->next = (header*)((char*)to_end_data(block) + 8);
    header* new_block = block->next;
    set_data_to_block(new_block, sz);

    return (void*)(new_block+1);
}

void set_data_to_block(header* block, size_t sz)
{
    block->size = sz;
    block->isfree = false;
    block->next = NULL;
    heap_size += sz;
}

header* to_end_data(header* block)
{
    return (header*)((size_t)(block+1) + block->size);
}
like image 226
Anatoly Avatar asked May 02 '15 16:05

Anatoly


People also ask

What should we do after malloc?

When you no longer need a block that you got with malloc , use the function free to make the block available to be allocated again. The prototype for this function is in stdlib. h .

What will happen if you keep using malloc () but never free ()?

If free() is not used in a program the memory allocated using malloc() will be de-allocated after completion of the execution of the program (included program execution time is relatively small and the program ends normally).

Which structure is internally used by malloc and free functions?

When one calls malloc , memory is taken from the large heap cell, which is returned by malloc . The rest is formed into a new heap cell that consists of all the rest of the memory. When one frees memory, the heap cell is added to the end of the heap's free list.

How malloc and free is implemented in C?

The “malloc” or “memory allocation” method in C is used to dynamically allocate a single large block of memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any form.


2 Answers

There are 3 ways to improve:

  • make it more robust
  • optimise the way memory is allocated
  • optimise the code that allocates the memory

Make It More Robust

There are many common programmer mistakes that could be easily detected (e.g. modifying data beyond the end of the allocated block). For a very simple (and relatively fast) example, it's possible to insert "canaries" before and after the allocated block, and detect programmer during free and realloc by checking if the canaries are correct (if they aren't then the programmer has trashed something by mistake). This only works "sometimes maybe". The problem is that (for a simple implementation of malloc) the meta-data is mixed in with the allocated blocks; so there's a chance that the meta-data has been corrupted even if the canaries haven't. To fix that it'd be a good idea to separate the meta-data from the allocated blocks. Also, merely reporting "something was corrupted" doesn't help as much as you'd hope. Ideally you'd want to have some sort of identifier for each block (e.g. the name of the function that allocated it) so that if/when problems occur you can report what was corrupted. Of course this could/should maybe be done via. a macro, so that those identifiers can be omitted when not debugging.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to return acceptable error conditions ("failed to allocate" is the only error it can return) and no way to pass additional information. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier) (with similar alterations to free and realloc to enable them to return a status code and identifier).

Optimise the Way Memory is Allocated

It's naive to assume that all memory is the same. It's not. Cache locality (including TLB locality) and other cache effects, and things like NUMA optimisation, all matter. For a simple example, imagine you're writing an application that has a structure describing a person (including a hash of their name) and a pointer to a person's name string; and both the structure and the name string are allocated via. malloc. The normal end result is that those structures and strings end up mixed together in the heap; so when you're searching through these structures (e.g. trying to find the structure that contains the correct hash) you end up pounding caches and TLBs. To optimise this properly you'd want to ensure that all the structures are close together in the heap. For that to work malloc needs to the difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (e.g. where everything in "memory pool number 1" is kept close in the heap).

Another important optimisations include "cache colouring" (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it can be important to know the difference between something where max. bandwidth is needed (where using memory from multiple NUMA domains increases bandwidth).

Finally, it'd be nice (to manage heap fragmentation, etc) to use different strategies for "temporary, likely to be freed soon" allocations and longer term allocations (e.g. where it's worth doing a extra to minimise fragmentation and wasted space/RAM).

Note: I'd estimate that getting all of this right can mean software running up to 20% faster in specific cases, due to far less cache misses, more bandwidth where it's needed, etc.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to pass the additional information to malloc in the first place. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags) (with similar alterations to realloc).

Optimise the Code That Allocates the Memory

Given that you can assume memory is used more frequently than its allocated; this is the least important (e.g. less important than getting things like cache locality for the allocated blocks right).

Quite frankly, anyone that actually wants decent performance or decent debugging shouldn't be using malloc to begin with - generic solutions to specific problems are never ideal. With this in mind (and not forgetting that the interface to malloc is lame and broken and prevents everything that's important anyway) I'd recommend simply not bothering with malloc and creating something that's actually good (but non-standard) instead. For this you can adapt the algorithms used by existing implementations of malloc.

like image 39
Brendan Avatar answered Oct 04 '22 02:10

Brendan


Notice that malloc is often built above lower level memory related syscalls (e.g. mmap(2) on Linux). See this answer which mentions GNU glibc and musl-libc. Look also inside tcmalloc, so study the source code of several free software malloc implementations.

Some general ideas for your malloc:

  • retrieve memory from the OS using mmap (and release it back to the OS kernel eventually with munmap). You certainly should not allocate a fixed size heap (since on a 64 bits computer with 128Gbytes of RAM you would want to succeed in malloc-ing a 10 billion bytes zone).
  • segregate small allocations from big ones, so handle differently malloc of 16 bytes from malloc of a megabyte. The typical threshold between small and large allocation is generally a small multiple of the page size (which is often 4Kbytes). Small allocations happen inside pages. Large allocations are rounded to pages. You might even handle very specially malloc of two words (like in many linked lists).
  • round up the requested size to some fancy number (e.g. powers of 2, or 3 times a power of 2).
  • manage together memory zones of similar sizes, that is of same "fancy" size.
  • for small memory zones, avoid reclaiming too early the memory zone, so keep previously free-d zones of same (small) size to reuse them in future calls to malloc.
  • you might use some tricks on the address (but your system might have ASLR), or keep near each memory zone a word of meta-data describing the chunk of which it is a member.
  • a significant issue is, given some address previously returned by malloc and argument of free, to find out the allocated size of that memory zone. You could manipulate the address bits, you could store that size in the word before, you could use some hash table, etc. Details are tricky.

Notice that details are tricky, and it might be quite hard to write a malloc implementation better than your system one. In practice, writing a good malloc is not a simple task. You should find many academic papers on the subject.

Look also into garbage collection techniques. Consider perhaps Boehm's conservative GC: you will replace malloc by GC_MALLOC and you won't bother about free... Learn about memory pools.

like image 84
Basile Starynkevitch Avatar answered Oct 04 '22 01:10

Basile Starynkevitch