Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are stack overflows still a problem?

People also ask

Are stack overflows still relevant?

The bottom line: stack (and heap) overflows are absolutely still relevant today. They're harder to exploit than they used to be but they're still relevant. To complement, in a few specific contexts buffer overflows can be easier to exploit because of memory layout leaks, e.g. in kernels.

Why do we get stack overflow error?

StackOverflowError is a runtime error which points to serious problems that cannot be caught by an application. The java. lang. StackOverflowError indicates that the application stack is exhausted and is usually caused by deep or infinite recursion.

What happened with stack overflow?

Stack Overflow was sold to Prosus, a Netherlands-based consumer internet conglomerate, on 2 June 2021 for $1.8 billion.


I've never personally encountered a stack overflow that wasn't caused by infinite recursion. In these cases, a dynamic stack size wouldn't help, it would just take a little longer to run out of memory.


1) In order to resize stacks, you have to be able to move memory around, meaning that pointers to anything on a stack can become invalid after a stack resize. Yes, you can use another level of indirection to solve this problem, but remember that the stack is used very, very frequently.

2) It significantly makes things more complicated. Push/pop operations on stacks usually work simply by doing some pointer arithmetic on a CPU register. That's why allocation on a stack is faster than allocation on the free-store.

3) Some CPUs (microcontrollers in particular) implement the stack directly on hardware, separate from the main memory.

Also, you can set the size of a stack of a thread when you create a new thread using beginthread(), so if you find that the extra stack space is unnecessary, you can set the stack size accordingly.

From my experience, stack overflows are usually caused by infinite recursions or recursive functions that allocate huge arrays on the stack. According to MSDN, the default stack size set by the linker is 1MB (the header of executable files can set their own default), which seems to be more than big enough for a majority of cases.

The fixed-stack mechanism works well enough for a majority of applications, so there's no real need to go change it. If it doesn't, you can always roll out your own stack.


I can't speak for "major languages". Many "minor" languages do heap-allocated activation records, with each call using a chunk of heap space instead of a linear stack chunk. This allows recursion to go as deep as you have address space to allocate.

Some folks here claim that recursion that deep is wrong, and that using a "big linear stack" is just fine. That isn't right. I'd agree that if you have to use the entire address space, you do a problem of some kind. However, when one has very large graph or tree structures, you want to allow deep recursion and you don't want to guess at how much linear stack space you need first, because you'll guess wrong.

If you decide to go parallel, and you have lots (thousand to million of "grains" [think, small threads]) you can't have 10Mb of stack space allocated to each thread, because you'll be wasting gigabytes of RAM. How on earth could you ever have a million grains? Easy: lots of grains that interlock with one another; when a grain is frozen waiting for a lock, you can't get rid of it, and yet you still want to run other grains to use your available CPUs. This maximizes the amount of available work, and thus allows many physical processors to be used effectively.

The PARLANSE parallel programming language uses this very-large-number of parallel grains model, and heap allocation on function calls. We designed PARLANSE to enable the symbolic analysis and transformation of very large source computer programs (say, several million lines of code). These produce... giant abstract syntax trees, giant control/data flow graphs, giant symbol tables, with tens of millions of nodes. Lots of opportunity for parallel workers.

The heap allocation allows PARLANSE programs to be lexically scoped, even across parallelism boundaries, because one can implement "the stack" as a cactus stack, where forks occur in "the stack" for subgrains, and each grain can consequently see the activation records (parent scopes) of its callers. This makes passing big data structures cheap when recursing; you just reference them lexically.

One might think that heap allocation slows down the program. It does; PARLANSE pays about a 5% penalty in performance but gains the ability to process very large structures in parallel, with as many grains as the address space can hold.


Stacks are resized dynamically - or to be precise, grown dynamically. You get an overflow when a stack cannot grow any further, which is not to say it exhausted the address space, but rather grown to conflict with a portion of memory used to other purposes (e.g., a process heap).

Maybe you mean that stacks cannot be moved dynamically? The root of that is probably that stacks are intimately coupled to the hardware. CPUs have registers and piles of logic dedicated to thread stack management (esp, ebp, call/return/enter/leave instructions on x86). If your language is compiled (or even jitted) you're bound to the hardware mechanism and cannot move stacks around.

This hardware 'limitation' is probably here to stay. Re-basing a thread stack during thread execution seems far from a reasonable demand from a hardware platform (and the added complexity would badly hamper all executed code on such an imaginary CPU, even compiled). One can picture a completely virtualized environment where this limitation does not hold, but since such code couldn't be jitted - it would be unbearably slow. Not a chance you could do anything interactive with it.


I am going to summarize the arguments in the answers so far because I find no answer covering this topic good enough.

Static stack investigation

Motivation

Not everyone needs it.

  • Most algorithms do not use deep recursion or a lot of threads, thus not a lot of people need dynamic stacks.
  • Dynamic stack would make an infinite-recursion stack overflow, which is an easy mistake to make, harder to diagnose. (memory overflow, while being as deadly as a stack overflow to the current process, is hazardous for other processess as well)
  • Every recursive algorithm can be emulated with a similar iterative one.

Implementation difficulties

Dynamic stack implementation turns out to be not as straightforward as it seems.

  • Stack resizing alone is not enough unless you have unlimited address space. You will sometimes need to relocate the stack as well.
  • Stack relocation would require updates for all the pointers to the data structures allocated on the stack. While it is straightforward (at least in managed languages) for the data in memory, there is no easy way to do the same for data in the CPU registers of the thread.
  • Some CPUs (microcontrollers in particular) implement the stack directly on hardware, separate from the main memory.

Existing implementations

There are some languages or runtime libraries that already have the dynamic stack feature or something similar to it.

  • Some runtime libraries (which?) do not pre-commit the entire block of memory allocated for stack. This can alleviate the problem, expecially for 64-bit systems, but not completely eliminate it.
  • Ira Baxter told us about PARLANSE, a language specifically designed for dealing with complex data structures with high degree of parallelism. It uses small heap-allocated "grains" of work instead of stack.
  • fuzzy lolipop told us that "Properly written Erlang doesn't have stackoverflows!"
  • Google Go programming language is said to have a dynamic stack. (a link would be nice)

I would like to see more examples here.

I hope I didn't forget any important pieces of information on this subject. Making this a community wiki so that anyone can add new information.


Why do we, programmers, still have this StackOverflow problem?

Stack of fixed size is easy to implement, and is acceptable for 99% of programs. "stack overflow" is a minor problem, that is somewhat rare. So there is no real reason to change things. Also, it is not a language problem, it is more related to platform/processor design, so you'll have to deal with it.

There is no way to write a recursive algorithm unless you are absolutely sure that the depth of recursion is tiny. Linear memory complexity of the recursive algorithm is often unacceptable.

Now this is incorrect. In recursive algorithm you can (almost?) always replace actual recursive call with some kind of container - list, std::vector, stack, array, FIFO queue, etc, that will act like stack. Calculation will "pop" arguments from the end of the container, and push new arguments into either end or beginning of container. Normally, the only limit on size of such container is total amount of RAM.

Here is a crude C++ example:

#include <deque>
#include <iostream>

size_t fac(size_t arg){
    std::deque<size_t> v;
    v.push_back(arg);
    while (v.back() > 2)
        v.push_back(v.back() - 1);
    size_t result = 1;
    for (size_t i = 0; i < v.size(); i++)
        result *= v[i];
    return result;
}

int main(int argc, char** argv){
    int arg = 12;
    std::cout << " fac of " << arg << " is " << fac(arg) << std::endl;
    return 0;
}

Less elegant than recursion, but no stackoverflow problem. Technically, we're "emulating" recursion in this case. You can think that stackoverflow is a hardware limitation you have to deal with.


I think we will see this restriction removed in a few years.

There is simply no fundamental technical reason for fixed size stackes. They exist for historical reasons and because the programmers of compilers and VM's are lazy and don't optimize if it is good enough right now.

But GO the google language already starts with a different approach. It allocates the stack in small 4K pieces. There are also many "stackless" programming language extensions like stackless python etc who are doing the same.

The reason for this is quite simple, the more threads you have the more address space is wasted. For programs which are slower with 64bit pointers it is a serious problem. You can't really have more then hundert threads in practice. This is not good if you write a server which might want to server 60000 clients with a thread for each one (wait for the 100 core/cpu systems in the near future).

On 64bit systems it's not so serious but it still requires more resources. For example TLB entries for pages are extremely serious for good performance. If you can satisfy 4000 normal thread stackes with one single TLB entry (given a page size of 16MB and 4KB active stack space) you can see the difference. Don't waste 1020KB just for stack that you almost never use.

Small grained multithreading will be a very very important technique in the future.