I'm trying to get a deeper understanding of how the low level operations of programming languages work and especially how they interact with the OS/CPU. I've probably read every answer in every stack/heap related thread here on Stack Overflow, and they are all brilliant. But there is still one thing that I didn't fully understand yet.
Consider this function in pseudo code which tends to be valid Rust code ;-)
fn foo() {
let a = 1;
let b = 2;
let c = 3;
let d = 4;
// line X
doSomething(a, b);
doAnotherThing(c, d);
}
This is how I assume the stack to look like on line X:
Stack
a +-------------+
| 1 |
b +-------------+
| 2 |
c +-------------+
| 3 |
d +-------------+
| 4 |
+-------------+
Now, everything I've read about how the stack works is that it strictly obeys LIFO rules (last in, first out). Just like a stack datatype in .NET, Java or any other programming language.
But if that's the case, then what happens after line X? Because obviously, the next thing we need is to work with a
and b
, but that would mean that the OS/CPU (?) has to pop out d
and c
first to get back to a
and b
. But then it would shoot itself in the foot, because it needs c
and d
in the next line.
So, I wonder what exactly happens behind the scenes?
Another related question. Consider we pass a reference to one of the other functions like this:
fn foo() {
let a = 1;
let b = 2;
let c = 3;
let d = 4;
// line X
doSomething(&a, &b);
doAnotherThing(c, d);
}
From how I understand things, this would mean that the parameters in doSomething
are essentially pointing to the same memory address like a
and b
in foo
. But then again this means that there is no pop up the stack until we get to a
and b
happening.
Those two cases make me think that I haven't fully grasped how exactly the stack works and how it strictly follows the LIFO rules.
The Python interpreter uses a call stack to run a Python program. When a function is called in Python, a new frame is pushed onto the call stack for its local execution, and every time a function call returns, its frame is popped off the call stack.
The call stack is what a program uses to keep track of method calls. The call stack is made up of stack frames—one for each method call. For instance, say we called a method that rolled two dice and printed the sum. When random.
The call stack depth used in the analysis can be varied to suit your application. By default Freja uses a call stack depth of 1, that is, only the innermost calling function is considered in the analysis. A lower call stack depth reduces the number of reported issues by merging different call sites.
Now, whenever a function is called a new stack frame is created with all the function's data and this stack frame is pushed in the program stack, and the stack pointer that always points the top of the program stack points the stack frame pushed as it is on the top of the program stack.
The call stack could also be called a frame stack.
The things that are stacked after the LIFO principle are not the local variables but the entire stack frames ("calls") of the functions being called. The local variables are pushed and popped together with those frames in the so-called function prologue and epilogue, respectively.
Inside the frame the order of the variables is completely unspecified; Compilers "reorder" the positions of local variables inside a frame appropriately to optimize their alignment so the processor can fetch them as quickly as possible. The crucial fact is that the offset of the variables relative to some fixed address is constant throughout the lifetime of the frame - so it suffices to take an anchor address, say, the address of the frame itself, and work with offsets of that address to the variables. Such an anchor address is actually contained in the so-called base or frame pointer which is stored in the EBP register. The offsets, on the other hand, are clearly known at compile time and are therefore hardcoded into the machine code.
This graphic from Wikipedia shows what the typical call stack is structured like1:
Add the offset of a variable we want to access to the address contained in the frame pointer and we get the address of our variable. So shortly said, the code just accesses them directly via constant compile-time offsets from the base pointer; It's simple pointer arithmetic.
#include <iostream>
int main()
{
char c = std::cin.get();
std::cout << c;
}
gcc.godbolt.org gives us
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl std::cin, %edi
call std::basic_istream<char, std::char_traits<char> >::get()
movb %al, -1(%rbp)
movsbl -1(%rbp), %eax
movl %eax, %esi
movl std::cout, %edi
call [... the insertion operator for char, long thing... ]
movl $0, %eax
leave
ret
.. for main
. I divided the code into three subsections.
The function prologue consists of the first three operations:
Then cin
is moved into the EDI register2 and get
is called; The return value is in EAX.
So far so good. Now the interesting thing happens:
The low-order byte of EAX, designated by the 8-bit register AL, is taken and stored in the byte right after the base pointer: That is -1(%rbp)
, the offset of the base pointer is -1
. This byte is our variable c
. The offset is negative because the stack grows downwards on x86. The next operation stores c
in EAX: EAX is moved to ESI, cout
is moved to EDI and then the insertion operator is called with cout
and c
being the arguments.
Finally,
main
is stored in EAX: 0. That is because of the implicit return
statement.
You might also see xorl rax rax
instead of movl
.leave
is abbreviating this epilogue and implicitly
After this operation and ret
have been performed, the frame has effectively been popped, although the caller still has to clean up the arguments as we're using the cdecl calling convention. Other conventions, e.g. stdcall, require the callee to tidy up, e.g. by passing the amount of bytes to ret
.
It is also possible not to use offsets from the base/frame pointer but from the stack pointer (ESB) instead. This makes the EBP-register that would otherwise contain the frame pointer value available for arbitrary use - but it can make debugging impossible on some machines, and will be implicitly turned off for some functions. It is particularly useful when compiling for processors with only few registers, including x86.
This optimization is known as FPO (frame pointer omission) and set by -fomit-frame-pointer
in GCC and -Oy
in Clang; note that it is implicitly triggered by every optimization level > 0 if and only if debugging is still possible, since it doesn't have any costs apart from that.
For further information see here and here.
1 As pointed out in the comments, the frame pointer is presumably meant to point to the address after the return address.
2 Note that the registers that start with R are the 64-bit counterparts of the ones that start with E. EAX designates the four low-order bytes of RAX. I used the names of the 32-bit registers for clarity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With