I'm debugging a rather weird stack overflow supposedly caused by allocating too large variables on stack and I'd like to clarify the following.
Suppose I have the following function:
void function() { char buffer[1 * 1024]; if( condition ) { char buffer[1 * 1024]; doSomething( buffer, sizeof( buffer ) ); } else { char buffer[512 * 1024]; doSomething( buffer, sizeof( buffer ) ); } }
I understand, that it's compiler-dependent and also depends on what optimizer decides, but what is the typical strategy for allocating memory for those local variables?
Will the worst case (1 + 512 kilobytes) be allocated immediately once function is entered or will 1 kilobyte be allocated first, then depending on condition either 1 or 512 kilobytes be additionally allocated?
For local variables, the memory they consume is on the stack. This means that they must have a fixed size known at compile time, so that when the function is called, the exact amount of memory needed is added to the stack by changing the value of the stack pointer.
The memory allocation is done either before or at the time of program execution. There are two types of memory allocations: Compile-time or Static Memory Allocation. Run-time or Dynamic Memory Allocation.
Local variables get stored in the stack section. and Heap section contains Objects and may also contain reference variables.
The memory for a function's local variables is allocated each time the function is called, on the stack.
On many platforms/ABIs, the entire stackframe (including memory for every local variable) is allocated when you enter the function. On others, it's common to push/pop memory bit by bit, as it is needed.
Of course, in cases where the entire stackframe is allocated in one go, different compilers might still decide on different stack frame sizes. In your case, some compilers would miss an optimization opportunity, and allocate unique memory for every local variable, even the ones that are in different branches of the code (both the 1 * 1024
array and the 512 * 1024
one in your case), where a better optimizing compiler should only allocate the maximum memory required of any path through the function (the else
path in your case, so allocating a 512kb block should be enough). If you want to know what your platform does, look at the disassembly.
But it wouldn't surprise me to see the entire chunk of memory allocated immediately.
I checked on LLVM:
void doSomething(char*,char*); void function(bool b) { char b1[1 * 1024]; if( b ) { char b2[1 * 1024]; doSomething(b1, b2); } else { char b3[512 * 1024]; doSomething(b1, b3); } }
Yields:
; ModuleID = '/tmp/webcompile/_28066_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" define void @_Z8functionb(i1 zeroext %b) { entry: %b1 = alloca [1024 x i8], align 1 ; <[1024 x i8]*> [#uses=1] %b2 = alloca [1024 x i8], align 1 ; <[1024 x i8]*> [#uses=1] %b3 = alloca [524288 x i8], align 1 ; <[524288 x i8]*> [#uses=1] %arraydecay = getelementptr inbounds [1024 x i8]* %b1, i64 0, i64 0 ; <i8*> [#uses=2] br i1 %b, label %if.then, label %if.else if.then: ; preds = %entry %arraydecay2 = getelementptr inbounds [1024 x i8]* %b2, i64 0, i64 0 ; <i8*> [#uses=1] call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay2) ret void if.else: ; preds = %entry %arraydecay6 = getelementptr inbounds [524288 x i8]* %b3, i64 0, i64 0 ; <i8*> [#uses=1] call void @_Z11doSomethingPcS_(i8* %arraydecay, i8* %arraydecay6) ret void } declare void @_Z11doSomethingPcS_(i8*, i8*)
You can see the 3 alloca
at the top of the function.
I must admit I am slightly disappointed that b2
and b3
are not folded together in the IR, since only one of them will ever be used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With