What is the cost of malloc(), in terms of CPU cycles? (Vista/OS, latest version of gcc, highest optimization level,...)
Basically, I'm implementing a complex DAG structure (similar to a linked list) composed of some 16B (less common) and 20B nodes (more common).
Occasionally, I will have to remove some nodes and then add some. But, rather than always using malloc() and free(), I can simply move unneeded nodes to the end of my data structure, and then update the fields as my algorithm continues. If a free node is available, I will update the fields; if not, I'll have to allocate a new one.
The problem is, I might have only one free node available while having to input, for example, 20 nodes worth of data. This means:
Question: Is it really worth it? Should I just malloc() and free() as usual, or is it worth it to keep some free nodes available at the end of the list, and keep checking even if it will usually fail and lead to malloc() anyway?
More concretely,
What is the CPU cost of malloc()??
The main reason why malloc() is rather slow is that it is providing a lot of functionality – the allocation of chunks of memory of variable size is somewhat complex.
Dynamic memory allocation and deallocation are very slow operations when compared to automatic memory allocation and deallocation. In other words, the heap is much slower than the stack.
malloc() allocates memory on the process heap. Memory allocated using malloc() will remain on the heap until it is freed using free() . alloca() allocates memory within the current function's stack frame. Memory allocated using alloca() will be removed from the stack when the current function returns.
C malloc() methodThe “malloc” or “memory allocation” method in C is used to dynamically allocate a single large block of memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any form.
Does it matter what it costs? Really?
The true answer is "it depends".
It depends on loads of things
If this code is massively performance critical, them time everything you can and work out the best pattern for your usage case.
If it is isn't the most performance critical bit of code, just do whatever is the clearest and simplest to implement and maintain.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil", Donald Knuth
malloc() does not have a fixed cost in terms of latency because of the numerous possible states the memory manager has to deal with to fulfill your request.
Since your node sizes are relatively small, you should consider always doing an allocation of some larger size, perhaps 10 or more node sizes per allocation and stuffing the extra ones into your unused pool. That way you'll incur allocation uncertainly less frequently. But more importantly, you'll reduce the amount of memory fragmentation caused by so many tiny allocations.
Incidentally, I don't consider this sort of design consideration "Premature Optimization" since you aren't looking for an excuse to inject obtuse design characteristics without good reason. Data structures which can grow to arbitrary size and persist for arbitrary durations do need a little bit of forethought.
Particularly since data structures tend to find their way into unplanned usages later and often by other developers, it is important to strike a reasonable balance in terms of clarity and anticipated behavior.
Write your structure proper with your own allocation and deallocation functions. Implement those separately. Initially have them just malloc and free a single node to make debugging easier. Later you can redesign them with fancier algorithms as your needs dictate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With