Why are Rust stack frames so big?

I encountered an unexpectedly early stack overflow and created the following program to test the issue:

fn get_rsp() -> usize {
    let rsp: usize;
    unsafe {
        asm! {
            "mov {}, rsp",
            out(reg) rsp

fn useless_function(x: usize) {
    if x > 0 {
        println!("{:x}", get_rsp());
        useless_function(x - 1);

fn main() {

This is get_rsp disassembled (according to cargo-asm):

 push    rax
 mov     rax, rsp
 pop     rcx

I'm not sure what #APP and #NO_APP do or why rax is pushed and then popped into rcx, but it seems the function does return the stack pointer.

I was surprised to find that in debug mode, the difference between two consecutively printed rsp was 192(!) and even in release mode it was 128. As far as I understand, all that needs to be stored for each call to useless_function is one usize and a return address, so I'd expect every stack frame to be around 16 bytes large.

I'm running this with rustc 1.46.0 on a 64-bit Windows machine.

Are my results consistent across machine? How is this explained?

It seems that the use of println! has a pretty significant effect. In an attempt to avoid that, I changed the program (Thanks to @Shepmaster for the idea) to store the values in a static array:

static mut RSPS: [usize; 10] = [0; 10];

fn useless_function(x: usize) {
    unsafe { RSPS[x] = get_rsp() };
    if x == 0 {
    useless_function(x - 1);

fn main() {
    println!("{:?}", unsafe { RSPS });

The recursion gets optimised away in release mode, but in debug mode each frame still takes 80 bytes which is way more than I anticipated. Is this just the way stack frames work on x86? Do other languages do better? This seems a little inefficient.

Using formatting machinery like println! creates a number of things on the stack. Expanding the macros used in your code:

fn useless_function(x: usize) {
    if x > 0 {
                &["", "\n"],
                &match (&get_rsp(),) {
                    (arg0,) => [::core::fmt::ArgumentV1::new(
        useless_function(x - 1);

I believe that those structs consume the majority of the space. As an attempt to prove that, I printed the size of the value created by format_args, which is used by println!:

let sz = std::mem::size_of_val(&format_args!("{:x}", get_rsp()));
println!("{}", sz);

This shows that it is 48 bytes.

Something like this should remove the printing from the equation, but the compiler / optimizer ignores the inline(never) hint here and inlines it anyway, resulting in the sequential values all being the same.

/// The length of `rsp` and the value of `x` must always match
unsafe fn useless_function(x: usize, rsp: &mut [usize]) {
    if x > 0 {
        *rsp.get_unchecked_mut(0) = get_rsp();
        useless_function(x - 1, rsp.get_unchecked_mut(1..));

fn main() {
    unsafe {
        let mut rsp = [0; 10];
        useless_function(rsp.len(), &mut rsp);

        for w in rsp.windows(2) {
            println!("{}", w[0] - w[1]);

That said, you can make the function public and look at its assembly anyway (lightly cleaned):

    pushq   %r15
    pushq   %r14
    pushq   %rbx
    testq   %rdi, %rdi
    je  .LBB6_3
    movq    %rsi, %r14
    movq    %rdi, %r15
    xorl    %ebx, %ebx

    callq   playground::get_rsp
    movq    %rax, (%r14,%rbx,8)
    addq    $1, %rbx
    cmpq    %rbx, %r15
    jne .LBB6_2

    popq    %rbx
    popq    %r14
    popq    %r15

but in debug mode each frame still takes 80 bytes

Compare the unoptimized assembly:

    subq    $104, %rsp
    movq    %rdi, 80(%rsp)
    movq    %rsi, 88(%rsp)
    movq    %rdx, 96(%rsp)
    cmpq    $0, %rdi
    movq    %rdi, 56(%rsp)                  # 8-byte Spill
    movq    %rsi, 48(%rsp)                  # 8-byte Spill
    movq    %rdx, 40(%rsp)                  # 8-byte Spill
    ja  .LBB44_2
    jmp .LBB44_8

    callq   playground::get_rsp
    movq    %rax, 32(%rsp)                  # 8-byte Spill
    xorl    %eax, %eax
    movl    %eax, %edx
    movq    48(%rsp), %rdi                  # 8-byte Reload
    movq    40(%rsp), %rsi                  # 8-byte Reload
    callq   core::slice::<impl [T]>::get_unchecked_mut
    movq    %rax, 24(%rsp)                  # 8-byte Spill
    movq    24(%rsp), %rax                  # 8-byte Reload
    movq    32(%rsp), %rcx                  # 8-byte Reload
    movq    %rcx, (%rax)
    movq    56(%rsp), %rdx                  # 8-byte Reload
    subq    $1, %rdx
    setb    %sil
    testb   $1, %sil
    movq    %rdx, 16(%rsp)                  # 8-byte Spill
    jne .LBB44_9
    movq    $1, 72(%rsp)
    movq    72(%rsp), %rdx
    movq    48(%rsp), %rdi                  # 8-byte Reload
    movq    40(%rsp), %rsi                  # 8-byte Reload
    callq   core::slice::<impl [T]>::get_unchecked_mut
    movq    %rax, 8(%rsp)                   # 8-byte Spill
    movq    %rdx, (%rsp)                    # 8-byte Spill
    movq    16(%rsp), %rdi                  # 8-byte Reload
    movq    8(%rsp), %rsi                   # 8-byte Reload
    movq    (%rsp), %rdx                    # 8-byte Reload
    callq   playground::useless_function
    jmp .LBB44_8

    addq    $104, %rsp

    leaq    str.0(%rip), %rdi
    leaq    .L__unnamed_7(%rip), %rdx
    movq    core::panicking::panic@GOTPCREL(%rip), %rax
    movl    $33, %esi
    callq   *%rax
