<h3>C language</h3> In the C programming language, it's easy to have tail recursion: <pre class="prettyprint lang-c prettyprint-override"><code>int foo(...) { return foo(...); } </code></pre> Just return as is the return value of the recursive call. It is especially important when this recursion may repeat a thousand or even a million times. It would use a lot of memory on the stack. <h3>Rust</h3> Now, I have a Rust function that might recursively call itself a million times: <pre class="prettyprint"><code>fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> { match input.read(&mut [0u8]) { Ok ( 0) => Ok(()), Ok ( _) => read_all(input), Err(err) => Err(err), } } </code></pre> (this is a minimal example, the real one is more complex, but it captures the main idea) Here, the return value of the recursive call is returned as is, but: Does it guarantee that the Rust compiler will apply a tail recursion? For instance, if we declare some variable that needs to be destroyed like a <code>std::Vec</code>, will it be destroyed just before the recursive call (which allows for tail recursion) or after the recursive call returns (which forbids the tail recursion)?

Shepmaster's answer explains that tail call elimination is merely an optimization, not a guarantee, in Rust. But "never guaranteed" doesn't mean "never happens". Let's take a look at what the compiler does with some real code. <h3>Does it happen in this function?</h3> As of right now, the latest release of Rust available on Compiler Explorer is 1.39, and it does not eliminate the tail call in <code>read_all</code>. <pre class="prettyprint lang-none prettyprint-override"><code>example::read_all: push r15 push r14 push rbx sub rsp, 32 mov r14, rdx mov r15, rsi mov rbx, rdi mov byte ptr [rsp + 7], 0 lea rdi, [rsp + 8] lea rdx, [rsp + 7] mov ecx, 1 call qword ptr [r14 + 24] cmp qword ptr [rsp + 8], 1 jne .LBB3_1 movups xmm0, xmmword ptr [rsp + 16] movups xmmword ptr [rbx], xmm0 jmp .LBB3_3 .LBB3_1: cmp qword ptr [rsp + 16], 0 je .LBB3_2 mov rdi, rbx mov rsi, r15 mov rdx, r14 call qword ptr [rip + example::read_all@GOTPCREL] jmp .LBB3_3 .LBB3_2: mov byte ptr [rbx], 3 .LBB3_3: mov rax, rbx add rsp, 32 pop rbx pop r14 pop r15 ret mov rbx, rax lea rdi, [rsp + 8] call core::ptr::real_drop_in_place mov rdi, rbx call _Unwind_Resume@PLT ud2 </code></pre> Notice this line: <code>call qword ptr [rip + example::read_all@GOTPCREL]</code>. That's the (tail) recursive call. As you can tell from its existence, it was not eliminated. Compare this to an equivalent function with an explicit <code>loop</code>: <pre class="prettyprint"><code>pub fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> { loop { match input.read(&mut [0u8]) { Ok ( 0) => return Ok(()), Ok ( _) => continue, Err(err) => return Err(err), } } } </code></pre> which has no tail call to eliminate, and therefore compiles to a function with only one <code>call</code> in it (to the computed address of <code>input.read</code>). Oh well. Maybe Rust isn't as good as C. Or is it? <h3>Does it happen in C?</h3> Here's a tail-recursive function in C that performs a very similar task: <pre class="prettyprint lang-c prettyprint-override"><code>int read_all(FILE *input) { char buf[] = {0, 0}; if (!fgets(buf, sizeof buf, input)) return feof(input); return read_all(input); } </code></pre> This should be super easy for the compiler to eliminate. The recursive call is right at the bottom of the function and C doesn't have to worry about running destructors. But nevertheless, there's that recursive tail call, annoyingly not eliminated: <pre class="prettyprint lang-none prettyprint-override"><code> call read_all </code></pre> It turns out that tail call optimization is not guaranteed in C, either. I tried Clang and gcc under different optimization levels, but nothing I tried would turn this fairly simple recursive function into a loop. <h3>Does it ever happen?</h3> Okay, so it's not guaranteed. Can the compiler do it at all? Yes! Here's a function that computes Fibonacci numbers via a tail-recursive inner function: <pre class="prettyprint"><code>pub fn fibonacci(n: u64) -> u64 { fn f(n: u64, a: u64, b: u64) -> u64 { match n { 0 => a, _ => f(n - 1, a + b, a), } } f(n, 0, 1) } </code></pre> Not only is the tail call eliminated, the whole <code>fibonacci_lr</code> function is inlined into <code>fibonacci</code>, yielding only 12 instructions (and not a <code>call</code> in sight): <pre class="prettyprint lang-none prettyprint-override"><code>example::fibonacci: push 1 pop rdx xor ecx, ecx .LBB0_1: mov rax, rdx test rdi, rdi je .LBB0_3 dec rdi add rcx, rax mov rdx, rcx mov rcx, rax jmp .LBB0_1 .LBB0_3: ret </code></pre> If you compare this to an equivalent <code>while</code> loop, the compiler generates almost the same assembly. <h3>What's the point?</h3> You probably shouldn't be relying on optimizations to eliminate tail calls, either in Rust or in C. It's nice when it happens, but if you need to be sure that a function compiles into a tight loop, the surest way, at least for now, is to use a loop.

Neither tail recursion (reusing a stack frame for a tail call to the same function) nor tail call optimization (reusing the stack frame for a tail call to any function) are ever guaranteed by Rust, although the optimizer may choose to perform them. <blockquote> if we declare some variable that needs to be destroyed </blockquote> It's my understanding that this is one of the sticking points, as changing the location of destroyed stack variables would be contentious. See also: <ul> <li>Recursive function calculating factorials leads to stack overflow</li> <li>RFC 81: guaranteed tail call elimination</li> <li>RFC 1888: Proper tail calls</li> </ul>

When is tail recursion guaranteed in Rust?

C language

In the C programming language, it's easy to have tail recursion:

int foo(...) {
    return foo(...);
}

Just return as is the return value of the recursive call. It is especially important when this recursion may repeat a thousand or even a million times. It would use a lot of memory on the stack.

Rust

Now, I have a Rust function that might recursively call itself a million times:

fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
    match input.read(&mut [0u8]) {
        Ok (  0) => Ok(()),
        Ok (  _) => read_all(input),
        Err(err) => Err(err),
    }
}

(this is a minimal example, the real one is more complex, but it captures the main idea)

Here, the return value of the recursive call is returned as is, but:

Does it guarantee that the Rust compiler will apply a tail recursion?

For instance, if we declare some variable that needs to be destroyed like a std::Vec, will it be destroyed just before the recursive call (which allows for tail recursion) or after the recursive call returns (which forbids the tail recursion)?

316

asked Dec 09 '19 22:12

uben

2 Answers

Shepmaster's answer explains that tail call elimination is merely an optimization, not a guarantee, in Rust. But "never guaranteed" doesn't mean "never happens". Let's take a look at what the compiler does with some real code.

Does it happen in this function?

As of right now, the latest release of Rust available on Compiler Explorer is 1.39, and it does not eliminate the tail call in read_all.

example::read_all:
        push    r15
        push    r14
        push    rbx
        sub     rsp, 32
        mov     r14, rdx
        mov     r15, rsi
        mov     rbx, rdi
        mov     byte ptr [rsp + 7], 0
        lea     rdi, [rsp + 8]
        lea     rdx, [rsp + 7]
        mov     ecx, 1
        call    qword ptr [r14 + 24]
        cmp     qword ptr [rsp + 8], 1
        jne     .LBB3_1
        movups  xmm0, xmmword ptr [rsp + 16]
        movups  xmmword ptr [rbx], xmm0
        jmp     .LBB3_3
.LBB3_1:
        cmp     qword ptr [rsp + 16], 0
        je      .LBB3_2
        mov     rdi, rbx
        mov     rsi, r15
        mov     rdx, r14
        call    qword ptr [rip + example::read_all@GOTPCREL]
        jmp     .LBB3_3
.LBB3_2:
        mov     byte ptr [rbx], 3
.LBB3_3:
        mov     rax, rbx
        add     rsp, 32
        pop     rbx
        pop     r14
        pop     r15
        ret
        mov     rbx, rax
        lea     rdi, [rsp + 8]
        call    core::ptr::real_drop_in_place
        mov     rdi, rbx
        call    _Unwind_Resume@PLT
        ud2

Notice this line: call qword ptr [rip + example::read_all@GOTPCREL]. That's the (tail) recursive call. As you can tell from its existence, it was not eliminated.

Compare this to an equivalent function with an explicit loop:

pub fn read_all(input: &mut dyn std::io::Read) -> std::io::Result<()> {
    loop {
        match input.read(&mut [0u8]) {
            Ok (  0) => return Ok(()),
            Ok (  _) => continue,
            Err(err) => return Err(err),
        }
    }
}

which has no tail call to eliminate, and therefore compiles to a function with only one call in it (to the computed address of input.read).

Oh well. Maybe Rust isn't as good as C. Or is it?

Does it happen in C?

Here's a tail-recursive function in C that performs a very similar task:

int read_all(FILE *input) {
    char buf[] = {0, 0};
    if (!fgets(buf, sizeof buf, input))
        return feof(input);
    return read_all(input);
}

This should be super easy for the compiler to eliminate. The recursive call is right at the bottom of the function and C doesn't have to worry about running destructors. But nevertheless, there's that recursive tail call, annoyingly not eliminated:

        call    read_all

It turns out that tail call optimization is not guaranteed in C, either. I tried Clang and gcc under different optimization levels, but nothing I tried would turn this fairly simple recursive function into a loop.

Does it ever happen?

Okay, so it's not guaranteed. Can the compiler do it at all? Yes! Here's a function that computes Fibonacci numbers via a tail-recursive inner function:

pub fn fibonacci(n: u64) -> u64 {
    fn f(n: u64, a: u64, b: u64) -> u64 {
        match n {
            0 => a,
            _ => f(n - 1, a + b, a),
        }
    }
    f(n, 0, 1)
}

Not only is the tail call eliminated, the whole fibonacci_lr function is inlined into fibonacci, yielding only 12 instructions (and not a call in sight):

example::fibonacci:
        push    1
        pop     rdx
        xor     ecx, ecx
.LBB0_1:
        mov     rax, rdx
        test    rdi, rdi
        je      .LBB0_3
        dec     rdi
        add     rcx, rax
        mov     rdx, rcx
        mov     rcx, rax
        jmp     .LBB0_1
.LBB0_3:
        ret

If you compare this to an equivalent while loop, the compiler generates almost the same assembly.

What's the point?

You probably shouldn't be relying on optimizations to eliminate tail calls, either in Rust or in C. It's nice when it happens, but if you need to be sure that a function compiles into a tight loop, the surest way, at least for now, is to use a loop.

answered Oct 17 '22 09:10

trent

Neither tail recursion (reusing a stack frame for a tail call to the same function) nor tail call optimization (reusing the stack frame for a tail call to any function) are ever guaranteed by Rust, although the optimizer may choose to perform them.

if we declare some variable that needs to be destroyed

It's my understanding that this is one of the sticking points, as changing the location of destroyed stack variables would be contentious.

Shepmaster

Related questions
                            
                                Are recursive methods always better than iterative methods in Java? [closed]
                            
                                C# flattening json structure
                            
                                Why is Perl so afraid of "deep recursion"?
                            
                                What is the hard recursion limit for Linux, Mac and Windows?
                            
                                The idiomatic way to implement generators (yield) in Golang for recursive functions
                            
                                What is tail-recursion elimination?
                            
                                Find and Replace string in all files recursive using grep and sed [duplicate]
                            
                                How can I recursively delete folder with a specific name with PowerShell?
                            
                                Determining the complexities given codes
                            
                                PHP tree structure for categories and sub categories without looping a query
                            
                                How to "unroll" a "recursive" structure
                            
                                Iterating through all the <div> tags on a page
                            
                                How do I make this loop all children recursively?
                            
                                Recursive Fibonacci
                            
                                Examples of Recursive functions [closed]
                            
                                Instantiation of recursive generic types slows down exponentially the deeper they are nested. Why?
                            
                                Recursive call - Action lambda
                            
                                Is there a more modern, OO version of "Let's Build a Compiler"? [closed]
                            
                                Recursive method for x^n optimised for when n is even
                            
                                Generate permutations of JavaScript array [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When is tail recursion guaranteed in Rust?

Tags:

recursion

tail-recursion

rust