What are the next step to improve malloc() algorithm? [closed]

Tags:

I'm writing my own simple malloc() function and I would like to create more faster and efficient variant. I'm writed function that use linear search and allocated sequentially and contiguously in memory.

What is the next step to improve this algorithm? What are the main shortcomings of my current version? I would be very grateful for any feedback and recommendation.

Click to copy

typedef struct heap_block
{  
    struct heap_block* next;
    size_t size;
    bool isfree;
}header;

#define Heap_Capacity 100000
static char heap[Heap_Capacity];
size_t heap_size;

void* malloc(size_t sz) 
{
    if(sz == 0 || sz > Heap_Capacity) { return NULL; }

    header* block = (header*)heap;
    if(heap_size == 0)
    {
        set_data_to_block(block, sz);
        return (void*)(block+1);
    }

    while(block->next != NULL) { block = block->next; }

    block->next = (header*)((char*)to_end_data(block) + 8);
    header* new_block = block->next;
    set_data_to_block(new_block, sz);

    return (void*)(new_block+1);
}

void set_data_to_block(header* block, size_t sz)
{
    block->size = sz;
    block->isfree = false;
    block->next = NULL;
    heap_size += sz;
}

header* to_end_data(header* block)
{
    return (header*)((size_t)(block+1) + block->size);
}

226

asked May 02 '15 16:05

Anatoly

2 Answers

There are 3 ways to improve:

make it more robust
optimise the way memory is allocated
optimise the code that allocates the memory

Make It More Robust

There are many common programmer mistakes that could be easily detected (e.g. modifying data beyond the end of the allocated block). For a very simple (and relatively fast) example, it's possible to insert "canaries" before and after the allocated block, and detect programmer during free and realloc by checking if the canaries are correct (if they aren't then the programmer has trashed something by mistake). This only works "sometimes maybe". The problem is that (for a simple implementation of malloc) the meta-data is mixed in with the allocated blocks; so there's a chance that the meta-data has been corrupted even if the canaries haven't. To fix that it'd be a good idea to separate the meta-data from the allocated blocks. Also, merely reporting "something was corrupted" doesn't help as much as you'd hope. Ideally you'd want to have some sort of identifier for each block (e.g. the name of the function that allocated it) so that if/when problems occur you can report what was corrupted. Of course this could/should maybe be done via. a macro, so that those identifiers can be omitted when not debugging.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to return acceptable error conditions ("failed to allocate" is the only error it can return) and no way to pass additional information. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier) (with similar alterations to free and realloc to enable them to return a status code and identifier).

Optimise the Way Memory is Allocated

It's naive to assume that all memory is the same. It's not. Cache locality (including TLB locality) and other cache effects, and things like NUMA optimisation, all matter. For a simple example, imagine you're writing an application that has a structure describing a person (including a hash of their name) and a pointer to a person's name string; and both the structure and the name string are allocated via. malloc. The normal end result is that those structures and strings end up mixed together in the heap; so when you're searching through these structures (e.g. trying to find the structure that contains the correct hash) you end up pounding caches and TLBs. To optimise this properly you'd want to ensure that all the structures are close together in the heap. For that to work malloc needs to the difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (e.g. where everything in "memory pool number 1" is kept close in the heap).

Another important optimisations include "cache colouring" (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it can be important to know the difference between something where max. bandwidth is needed (where using memory from multiple NUMA domains increases bandwidth).

Finally, it'd be nice (to manage heap fragmentation, etc) to use different strategies for "temporary, likely to be freed soon" allocations and longer term allocations (e.g. where it's worth doing a extra to minimise fragmentation and wasted space/RAM).

Note: I'd estimate that getting all of this right can mean software running up to 20% faster in specific cases, due to far less cache misses, more bandwidth where it's needed, etc.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to pass the additional information to malloc in the first place. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags) (with similar alterations to realloc).

Optimise the Code That Allocates the Memory

Given that you can assume memory is used more frequently than its allocated; this is the least important (e.g. less important than getting things like cache locality for the allocated blocks right).

Quite frankly, anyone that actually wants decent performance or decent debugging shouldn't be using malloc to begin with - generic solutions to specific problems are never ideal. With this in mind (and not forgetting that the interface to malloc is lame and broken and prevents everything that's important anyway) I'd recommend simply not bothering with malloc and creating something that's actually good (but non-standard) instead. For this you can adapt the algorithms used by existing implementations of malloc.

answered Oct 04 '22 02:10

Brendan

Notice that malloc is often built above lower level memory related syscalls (e.g. mmap(2) on Linux). See this answer which mentions GNU glibc and musl-libc. Look also inside tcmalloc, so study the source code of several free software malloc implementations.

Some general ideas for your malloc:

retrieve memory from the OS using mmap (and release it back to the OS kernel eventually with munmap). You certainly should not allocate a fixed size heap (since on a 64 bits computer with 128Gbytes of RAM you would want to succeed in malloc-ing a 10 billion bytes zone).
segregate small allocations from big ones, so handle differently malloc of 16 bytes from malloc of a megabyte. The typical threshold between small and large allocation is generally a small multiple of the page size (which is often 4Kbytes). Small allocations happen inside pages. Large allocations are rounded to pages. You might even handle very specially malloc of two words (like in many linked lists).
round up the requested size to some fancy number (e.g. powers of 2, or 3 times a power of 2).
manage together memory zones of similar sizes, that is of same "fancy" size.
for small memory zones, avoid reclaiming too early the memory zone, so keep previously free-d zones of same (small) size to reuse them in future calls to malloc.
you might use some tricks on the address (but your system might have ASLR), or keep near each memory zone a word of meta-data describing the chunk of which it is a member.
a significant issue is, given some address previously returned by malloc and argument of free, to find out the allocated size of that memory zone. You could manipulate the address bits, you could store that size in the word before, you could use some hash table, etc. Details are tricky.

Notice that details are tricky, and it might be quite hard to write a malloc implementation better than your system one. In practice, writing a good malloc is not a simple task. You should find many academic papers on the subject.

Look also into garbage collection techniques. Consider perhaps Boehm's conservative GC: you will replace malloc by GC_MALLOC and you won't bother about free... Learn about memory pools.

answered Oct 04 '22 01:10

Basile Starynkevitch

Related questions
                            
                                In how many ways can a function be invoked(called) in C++?
                            
                                memset() to initialize object in constructor?
                            
                                Lint prefers erase() method on std:string rather than clear()
                            
                                c++ ordered(stable) priority queue
                            
                                Visual Studio custom build event always executing
                            
                                can an std::promise be made from a non-POD object?
                            
                                How can I build my local git repo on external server?
                            
                                std::unique_ptr to transfer ownership of a const object
                            
                                Why cast expression to rvalue reference to function is lvalue?
                            
                                Constant expression string parameter in C++11 constexpr function
                            
                                Is this multiple accesses to the same object between sequence points well-defined behavior?
                            
                                How can I return a dependent type from templated class method?
                            
                                CRT library type
                            
                                Variable has value without being assigned C++
                            
                                Is the result of static_casting a constexpr void* a constant expression?
                            
                                Perlin Noise Generation [closed]
                            
                                Get SHA1 of Unicode string in Crypto++
                            
                                Loss of precision when using pow in C++
                            
                                Only overload operator if template argument does
                            
                                C/C++: -msse and -msse2 Flags do not have any effect on the binaries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the next step to improve malloc() algorithm? [closed]

Tags:

c++

c

memory-management

algorithm

Anatoly

People also ask

2 Answers

Brendan

Basile Starynkevitch

Recent Activity

Donate For Us