Why is memory allocation on heap MUCH slower than on stack?

Tags:

I have been told this many times. But I don't know WHY...What extra cost is involved when allocating memory from heap? Is it hardware related? Is it related to CPU cycles? So many guesses but no exact answers...Could someone give me some elaboration?

Just as "unwind" said, the Heap data structure is more complicated than Stack. And In my opinion, some memory space is allocated to a thread as its Stack when it starts to run, while the heap is shared by all the threads within a process. This paradigm require some extra mechanism to manage each thread's usage of the shared heap, such as Garbage Collection. Am I right on this?

626

asked Feb 15 '10 09:02

smwikipedia

2 Answers

Because the heap is a far more complicated data structure than the stack.

For many architectures, allocating memory on the stack is just a matter of changing the stack pointer, i.e. it's one instruction. Allocating memory on the heap involves looking for a big enough block, splitting it, and managing the "book-keeping" that allows things like free() in a different order.

Memory allocated on the stack is guaranteed to be deallocated when the scope (typically the function) exits, and it's not possible to deallocate just some of it.

answered Oct 31 '22 05:10

unwind

In your edit where you restate unwind's answer you mention the "heap data structure". Be very careful as the data structure known as a heap has no relationship to dynamic memory allocation. To be very clear, I'll use the more language lawyer terminology of free store.

As has already been pointed out, stack allocation requires incrementing a pointer, which typically has a dedicated register on most architectures and deallocation requires the same amount of work. Stack allocations are also scoped to a particular function. This makes them much better candidates for compiler optimizations like precomputing the total space needed on the stack and doing a single increment for an entire stack frame. Likewise, the stack has better guaranteed data locality. The top of the stack is almost always guaranteed to be inside a cache line, and as I mentioned already the stack pointer is typically stored in a register. Optimizing compilers on some architectures can even eliminate allocations altogether on the stack by reusing arguments from previous stack frames that are passed as arguments to called functions in deeper stack frames. Likewise stack allocated variables can often be promoted to registers avoiding allocations as well.

In contrast, the free store is much more complex. I'm not even going to begin to cover garbage collection systems as that's a whole different topic, and this question was asked about the C language. Typically allocations and deallocations from a free store involve several different data structures like a free list or block pool. These data structures and bookkeeping require memory as well, and thus that space is wasted. Furthermore, the bookkeeping records are often intermixed with the allocations and thus hurt the data locality of other allocations. Allocations from the free store may involve asking the underlying operating system for more process memory typically from some form of slab allocator.

For a simple comparison, and using jemalloc-2.2.5 and numbers from sloccount as a reference, the jemalloc implementation contains over 8,800 lines of source code in the C language and another over 700 lines of test code. This should give you a good idea of the difference in complexity between free store allocation and stack allocation: thousands of lines of C code versus a single instruction.

Additionally, since free store allocations are not limited to a single lexical scope, the lifetime of every allocation must be tracked. Likewise, these allocations may be passed across threads, and thus thread synchronization issues enter the problem space. Another big problem for free store allocation is fragmentation. Fragmentation causes many problems:

Fragmentation hurts data locality.
Fragmentation wastes memory.
Fragmentation makes the job of finding free space for large allocations harder.

On modern systems, stacks are often relatively small in comparison to the free store, so ultimately the free store is managing more space and thus tackling a harder problem. Also, due to the limitations on stack sizes, the free store is typically used for larger allocations, this discrepancy between having to handle both very large and very small allocations makes the job of the free store harder as well. Typically stack allocations are small on the order of a few kilobytes or less, and the total size of the stack is only a few megabytes. The free store is generally given the entire rest of process space in a program. On modern machines this can be several hundred gigabytes, and it's not uncommon for free store allocations to vary in size from a few bytes like a short string of characters to megabytes or even gigabytes of arbitrary data. This means that free store allocators have to deal with the underlying operating system's virtual memory management. Stack allocation is essentially built-in to the computer hardware.

If you want to really learn about free store allocation, I would highly recommend reading some of the many papers and articles published about various malloc implementations or even reading the code. Here's a few links to get you started:

dlmalloc - Doug Lea's malloc, a historical reference malloc implementation used in GNU C++ at one point in time
phkmalloc - FreeBSD implementation of malloc written by Poul-Henning Kamp author of the Varnish web cache
tcmalloc - Thread-Caching Malloc implemented by some Google developers
jemalloc - Jason Evan's malloc implementation for FreeBSD (successor of phkmalloc)

Here's some additional links with descriptions of the tcmalloc implementation:

http://jamesgolick.com/2013/5/15/memory-allocators-101.html
http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html

answered Oct 31 '22 04:10

b4hand

Related questions
                            
                                What is the difference between read() and fread()?
                            
                                Why won't extern link to a static variable?
                            
                                How do we check if a pointer is NULL pointer?
                            
                                Converting between C++ std::vector and C array without copying
                            
                                How to recursively list directories in C on Linux?
                            
                                I have Python on my Ubuntu system, but gcc can't find Python.h
                            
                                The ternary (conditional) operator in C
                            
                                What's the point of const void?
                            
                                Is type-punning through a union unspecified in C99, and has it become specified in C11?
                            
                                how to print __uint128_t number using gcc?
                            
                                Difference between int main() and int main(void)?
                            
                                What is the difference between using _exit() & exit() in a conventional Linux fork-exec?
                            
                                Is it better to avoid using the mod operator when possible?
                            
                                Is char guaranteed to be exactly 8-bit long? [duplicate]
                            
                                Is there a 128 bit integer in gcc?
                            
                                What does the integer suffix J mean?
                            
                                Why does double in C print fewer decimal digits than C++?
                            
                                Parallel for vs omp simd: when to use each?
                            
                                Get the compiler options from a compiled executable?
                            
                                Split string in C every white space

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is memory allocation on heap MUCH slower than on stack?

Tags:

c

memory-management

stack

smwikipedia

People also ask

2 Answers

unwind

b4hand

Recent Activity

Donate For Us