Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can/does the (forward) pipe operator prevent tail call optimization?

For a parameter optimization problem at work I wrote a genetic algorithm to find some good settings because a brute-force solution is unfeasible. Unfortunately, when I return in the morning, most of the time I'm presented with a StackOverflowException.

I've been using F# for quite some time now so I'm aware of TCO and the need for functions with accumulator arguments and generally use that form.

After a lot of searching I think I was able to nail to the code that triggered the exception:

breedPopulation alive |> simulate (generation + 1) lastTime ewma

breedPopulation generates a new generation from the individuals in the current alive one. Then the next round/generation is started with the call to simulate. When I look at the disassembly (total noob) I spot some pop and a ret, so it does not look like a regular (non-tail) call to me.

mov         rcx,qword ptr [rbp+10h]  
mov         rcx,qword ptr [rcx+8]  
mov         rdx,qword ptr [rbp-40h]  
cmp         dword ptr [rcx],ecx  
call        00007FFA3E4905C0  
mov         qword ptr [rbp-0F0h],rax  
mov         r8,qword ptr [rbp-0F0h]  
mov         qword ptr [rbp-80h],r8  
mov         r8,qword ptr [rbp-78h]  
mov         qword ptr [rsp+20h],r8  
mov         r8d,dword ptr [rbp+18h]  
inc         r8d  
mov         rdx,qword ptr [rbp+10h]  
mov         r9,qword ptr [rbp-20h]  
mov         rcx,7FFA3E525960h  
call        00007FFA3E4A5040  
mov         qword ptr [rbp-0F8h],rax  
mov         rcx,qword ptr [rbp-0F8h]  
mov         rdx,qword ptr [rbp-80h]  
mov         rax,qword ptr [rbp-0F8h]  
mov         rax,qword ptr [rax]  
mov         rax,qword ptr [rax+40h]  
call        qword ptr [rax+20h]  
mov         qword ptr [rbp-100h],rax  
mov         rax,qword ptr [rbp-100h]  
lea         rsp,[rbp-10h]  
pop         rsi  
pop         rdi  
pop         rbp  
ret

After throwing away the pipe operator and putting the breeding in a normal parameter position, the disassembly is different.

//    simulate (generation + 1) lastTime ewma (breedPopulation alive)
mov         ecx,dword ptr [rbp+18h]  
inc         ecx  
mov         dword ptr [rbp-30h],ecx  
mov         rcx,qword ptr [rbp-20h]  
mov         qword ptr [rbp-38h],rcx  
mov         rcx,qword ptr [rbp-80h]  
mov         qword ptr [rbp-0F0h],rcx  
mov         rcx,qword ptr [rbp+10h]  
mov         rcx,qword ptr [rcx+8]  
mov         rdx,qword ptr [rbp-48h]  
cmp         dword ptr [rcx],ecx  
call        00007FFA3E4605C0  
mov         qword ptr [rbp-0F8h],rax  
mov         rax,qword ptr [rbp-0F8h]  
mov         qword ptr [rbp+30h],rax  
mov         rax,qword ptr [rbp-0F0h]  
mov         qword ptr [rbp+28h],rax  
mov         rax,qword ptr [rbp-38h]  
mov         qword ptr [rbp+20h],rax  
mov         eax,dword ptr [rbp-30h]  
mov         dword ptr [rbp+18h],eax  
nop  
jmp         00007FFA3E47585B

That's definitely shorter and with the final jmp even better that a tail call.

Therefore I want to understand if and why the |> seems to be the problem and when it does make a difference—after all, this is the first time it bites me after years. Under what circumstances does it happen and what do we have to watch out for?


Update: After Guy pointed out that my listings are not IL but assembly, I first reworded the question. This is what I found out with ILSpy:

With the |> operator

Looking at the decompiled C#, the code seems to jump back and forth between

internal static FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]> simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma)
{
    return new $Universe.simulate@267-2(x, pleaseStop, generation, lastTime, ewma);
}

and

// internal class simulate@267-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(Types.Genome[] population)
{
    LbpArea[][] array = ArrayModule.Parallel.Map<Types.Genome, LbpArea[]>(this.x.genomeToArray, population);
    FSharpFunc<System.Tuple<System.Tuple<float, float>, LbpArea[]>, float> accessFitness = this.x.accessFitness;
    System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array2 = ArrayModule.Filter<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@274(accessFitness), ArrayModule.Parallel.Map<LbpArea[], System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@273-1(this.x), array));
    if (array2 == null)
    {
        throw new System.ArgumentNullException("array");
    }
    System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array3 = ArrayModule.SortWith<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@275-2(), array2);
    this.x.Population = array3;
    System.Tuple<System.DateTime, FSharpOption<double>> tuple = this.x.printProgress<float, LbpArea[]>(this.lastTime, this.ewma, this.generation, array3);
    System.DateTime item = tuple.Item1;
    FSharpOption<double> item2 = tuple.Item2;
    if (this.pleaseStop.WaitOne(0))
    {
        return array3;
    }
    Types.Genome[] func = this.x.breedPopulation(array3);
    return $Universe.simulate@265-1(this.x, this.pleaseStop, this.generation + 1, item, item2).Invoke(func);
}

In the IL of the new call there is no tail. op to be found. On the other hand, the IL of the last lines of the Invoke read

IL_00d3: call class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]> '<StartupCode$BioID-GeneticLbp>.$Universe'::'simulate@265-1'(class BioID.GeneticLbp.Universe, class [mscorlib]System.Threading.ManualResetEvent, int32, valuetype [mscorlib]System.DateTime, class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<float64>)
IL_00d8: ldloc.s 7
IL_00da: tail.
IL_00dc: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]>::Invoke(!0)
IL_00e1: ret

I don't know what to make of that.

Without |> operator

The other version is indeed very different. Starting with

internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@264(Universe x, System.Threading.ManualResetEvent pleaseStop, Unit unitVar0)
{
    FSharpFunc<int, FSharpFunc<System.DateTime, FSharpFunc<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>>> fSharpFunc = new $Universe.simulate@265-2(x, pleaseStop);
    (($Universe.simulate@265-2)fSharpFunc).x = x;
    (($Universe.simulate@265-2)fSharpFunc).pleaseStop = pleaseStop;
    System.Tuple<System.Tuple<float, float>, LbpArea[]>[] population = x.Population;
    Types.Genome[] func;
    if (population != null && population.Length == 0)
    {
        func = x.lengthRandomlyIncreasing([email protected]@);
        return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
    }
    FSharpFunc<LbpArea[], Types.Genome> arrayToGenome = x.arrayToGenome;
    func = ArrayModule.Parallel.Map<System.Tuple<System.Tuple<float, float>, LbpArea[]>, Types.Genome>(new $Universe.simulate@296-3(arrayToGenome), population);
    return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
}

it goes to

// internal class simulate@265-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
    return $Universe.simulate@265-1(this.x, this.pleaseStop, generation, lastTime, ewma, population);
}

and finally

internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
    while (true)
    {
        // Playing evolution...
        if (pleaseStop.WaitOne(0))
        {
            return array3;
        }
        // Setting up parameters for next loop...
    }
    throw new System.ArgumentNullException("array");
}

tl;dr

So definitely, the usage of the pipe operator drastically changed the program flow. My guess is that the back and forth between the two function is what eventually causes the exception.

I had already read Tail Calls in F# but I don't think it applies to this situation, as I'm not using a first-class function returning unit as value (in my F# code).

So the question remains: Why does the pipe operator have this destructive effect here? How could I've known beforehand/what do I need to watch out for?


Update 2:

You can find a reduced version of the example at GitHub. Please see for yourself that the inline operator |> changes the produces IL, which is not what I would expect.

While reducing the example, with a little luck I was able to find the real source of the exception. You can check the branch for the much more minimal variant. After all, it doesn't have anything to do with the pipe, but I still don't get it because IMHO there is tail recursion.

But my original questions remain. I'm just adding one more. :)

like image 243
primfaktor Avatar asked Mar 01 '16 12:03

primfaktor


1 Answers

Based on the minimal case as provided, if the code is run in release mode in 64-bit it fails with a stack overflow. If the code is run in release mode in 32-bit mode it succeeds.

Note: The option to choose between 32-bit and 64-bit is Prefer 32-bit as seen in the images below.

Increasing the stack size will result in the code succeeding in release mode in 64-bit. This is done via the use of the Thread constructor.

[<EntryPoint>]
let main _ =

    let test () =
        let r = KissRandom()
        let n = r.Normal()
        Seq.item 20000 n |> printfn "%f"

    /// The greatest maximum-stack-size that should be used
    /// with the 'runWithStackFrame' function.
    let STACK_LIMIT = 16777216

    /// Run a function with a custom maximum stack size.
    /// This is necessary for some functions to execute
    /// without raising a StackOverflowException.
    let runWithCustomStackSize maxStackSize fn =
        // Preconditions
        if maxStackSize < 1048576 then
            invalidArg "stackSize" "Functions should not be executed with a \
                maximum stack size of less than 1048576 bytes (1MB)."
        elif maxStackSize > STACK_LIMIT then
            invalidArg "stackSize" "The maximum size of the stack frame should \
                not exceed 16777216 bytes (16MB)."

        /// Holds the return value of the function.
        let result = ref Unchecked.defaultof<'T>

        // Create a thread with the specified maximum stack size,
        // then immediately execute the function on it.
        let thread = System.Threading.Thread ((fun () -> result := fn()), maxStackSize)
        thread.Start ()

        // Wait for the function/thread to finish and return the result.
        thread.Join ()
        !result

    /// Runs a function within a thread which has an enlarged maximum-stack-size.
    let inline runWithEnlargedStack fn =
        runWithCustomStackSize STACK_LIMIT fn


//    test ()       // Fails with stack overflow in 64-bit mode, Release
                    // Runs successfully in 32-bit mode, Release

    runWithEnlargedStack test
    
    printf "Press any key to exit: "
    System.Console.ReadKey() |> ignore
    printfn ""

    0

This code is from FSharp-logic-examples and in particular Anh-Dung Phan

While I have not checked for the root cause, I suspect it is because of the size of the items for 64-bit is larger that the size of the items for 32-bit and even though the number of items put into the stack and the stack size remains the same for both versions, the item size increase pushes the memory needed for the stack over the 1 megabyte limit.

TL;DR

This has been a fun and enlightening question to answer. I am glad it was asked.

Originally the problem appeared to be related to the use of |> and TCO and since that is still of value I am leaving it in the answer. I would also like to thank the OP for being response and helpful, it is a pleasure to help someone who works with you and not against you.

In the following code which is recursive and has |> is run in Debug mode within Visual Studio it causes a StackOverflow.

If it is started from the command line from the bin\release directory it does NOT cause a StackOverflow.

Using Visual Studio 15 Community

[<EntryPoint>]
let main argv = 

    let largeList = 
        printfn "Creating large list"
        [
            for i in 1 .. 100000000 do
                yield i
        ]

    // causes StackOverflow in Debug
    // No StackOverflow in Release
    let sum4 l =
        printfn "testing sum4"
        let rec sumInner4 l acc =
            match l with
            | h::t -> 
                let acc = acc + h
                acc |> sumInner4 t
            | [] -> acc
        sumInner4 l 0

    let result4 = sum4 largeList
    printfn "result4: %A" result4

Where Release or Debug is set in the Visual Studio toolbar

enter image description here

and the options for the project in Debug mode are

enter image description here

and the options for the project in Release mode are

enter image description here

tldr;

In the process of testing this I created 16 different test and built them in both debug and release mode and verified if they ran to completion or threw a stack overflow. The 16 are broken down into a set of 4 with 4 cases each. The cases 1,5,9,13 are a negative and produce a stack overflow to ensure that a stack overflow can be created. Cases 2,6,10,14 are a positive to show that the tail call is working and not causing a stack overflow. Cases 3,7,11,15 show a tail call with an operation done in the same statement as the tail call, and to be one factorization away from the test cases using |>; these work as expected. Cases 4,8,12,16 use |> and shows when it does and does not work in debug mode, which is probably a surprise to many. Cases 1-4 and 9-12 use a function of the form f x y, cases 8-11 use a function of the form f x and cases 12-16 use a function of the form f x y z. I originally did the first 8 test cases but after Keith's comment did 4 more which don't use a list but still use a function of the from f x y and present the unexpected result and then did 4 more that use a function of the form f x y z.

To run a test you will have to comment out all but the one test you plan to run and the build it once in debug mode, which can then be run from within Visual Studio, and then again build it in release mode and run it. I run it from a command line to ensure I am running the release version.

[<EntryPoint>]
let main argv = 

    let largeList = 
        printfn "Creating large list"
        [
            for i in 1 .. 100000000 do
                yield i
        ]

    // causes StackOverflow in Debug
    // causes StackOverflow in Release
    //   Negative confirmation
    //   A supposed tail call that DOES cause a stack overflow in both debug and release mode
    //   options: f x y
    let sum1 l = 
        printfn "test 01: "
        let rec sum1Inner l acc =
            match l with
            | h::t -> 
                let acc = acc + h
                1 + sum1Inner t acc
            | [] -> acc
        sum1Inner l 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   Positive confirmation
    //   A tail call that DOES NOT cause a stack overflow in both debug and release mode
    //   options: f x y
    let sum2 l =
        printfn "test 02: "
        let rec sum2Inner l acc =
            match l with
            | h::t -> 
                let acc = acc + h
                sum2Inner t acc
            | [] -> acc
        sum2Inner l 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case
    //   options: f x y and no |>
    let sum3 l =
        printfn "test 03: "
        let rec sum3Inner l acc =
            match l with
            | h::t -> 
                sum3Inner t (acc + h)
            | [] -> acc
        sum3Inner l 0
        
    // causes StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case
    //   options: f x y and |>
    let sum4 l =
        printfn "test 04: "
        let rec sum4Inner l acc =
            match l with
            | h::t -> 
                let acc = acc + h
                acc |> sum4Inner t
            | [] -> acc
        sum4Inner l 0
        
    // causes StackOverflow in Debug
    // causes StackOverflow in Release
    //   Negative confirmation
    //   A supposed tail call that DOES cause a stack overflow in both debug and release mode
    //   options: f x
    let sum5 () =
        printfn "test 05: "
        let rec sum5Inner x =
            match x with 
            | 10000000 -> x
            | _ -> 
                let acc = x + 1
                1 + sum5Inner acc
        sum5Inner 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   Positive confirmation
    //   A tail call that DOES NOT cause a stack overflow in both debug and release mode
    //   options: f x
    let sum6 () =
        printfn "test 06: "
        let rec sum6Inner x =
            match x with 
            | 10000000 -> x
            | _ -> 
                let acc = x + 1
                sum6Inner acc
        sum6Inner 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //  A test case
    //  options: f x and no |>
    let sum7 l =
        printfn "test 07: "
        let rec sum7Inner x =
            match x with 
            | 10000000 -> x
            | _ -> sum7Inner (x + 1)
        sum7Inner 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case
    //   options: f x and |>
    let sum8 () =
        printfn "test 07: "
        let rec sumInner8 x =
            match x with
            | 10000000 -> x
            | _ -> 
                let acc = x + 1
                acc |> sumInner8 
        sumInner8 0

    // causes StackOverflow in Debug
    // causes StackOverflow in Release
    //   Negative confirmation"
    //   A supposed tail call that DOES cause a stack overflow in both debug and release mode"
    //   options: f x y"
    let sum9 () = 
        printfn "test 09: "
        let rec sum9Inner x y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                1 + sum9Inner x acc
        sum9Inner 1 0   
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   Positive confirmation
    //   A tail call that DOES NOT cause a stack overflow in both debug and release mode
    //   options: f x y
    let sum10 () =
        printfn "test 10: "
        let rec sum10Inner x y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                sum10Inner x acc
        sum10Inner 1 0

    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case
    //   options: f x y and no |>
    let sum11 () =
        printfn "test 11: "
        let rec sum11Inner x y =
            match y with
            | 10000000 -> y
            | _ -> 
                sum11Inner x (x + y) 
        sum11Inner 1 0
        
    // causes StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case
    //   options: f x y and |>
    let sum12 () =
        printfn "test 12: "
        let rec sum12Inner x y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                acc |> sum12Inner x
        sum12Inner 1 0

    // causes StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case"
    //   options: f x y and |>"
    let sum12 () =
        printfn "test 12: "
        let rec sum12Inner x y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                acc |> sum12Inner x
        sum12Inner 1 0

    // causes StackOverflow in Debug
    // causes StackOverflow in Release
    //   Negative confirmation"
    //   A supposed tail call that DOES cause a stack overflow in both debug and release mode"
    //   options: f x y"
    let sum13 () = 
        printfn "test 13: "
        let rec sum13Inner x z y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                1 + sum13Inner x z acc 
        sum13Inner 1 "z" 0
        
    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   Positive confirmation"
    //   A tail call that DOES NOT cause a stack overflow in both debug and release mode"
    //   options: f x y"
    let sum14 () =
        printfn "test 14: "
        let rec sum14Inner x z y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                sum14Inner x z acc
        sum14Inner 1 "z" 0

    // No StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case"
    //   options: f x y and no |>"
    let sum15 () =
        printfn "test 15: "
        let rec sum15Inner x z y =
            match y with
            | 10000000 -> y
            | _ -> 
                sum15Inner x z (x + y) 
        sum15Inner 1 "z" 0

    // causes StackOverflow in Debug
    // No StackOverflow in Release
    //   A test case"
    //   options: f x y and |>"
    let sum16 () =
        printfn "test 16: "
        let rec sum16Inner x z y =
            match y with
            | 10000000 -> y
            | _ -> 
                let acc = x + y
                acc |> sum16Inner x z
        sum16Inner 1 "z" 0

    let result1 = sum1 largeList
    printfn "result1: %A" result1

    let result2 = sum2 largeList
    printfn "result2: %A" result2

    let result3 = sum3 largeList
    printfn "result3: %A" result3

    let result4 = sum4 largeList
    printfn "result4: %A" result4

    let result5 = sum5 ()
    printfn "result5: %A" result5

    let result6 = sum6 ()
    printfn "result6: %A" result6

    let result7 = sum7 ()
    printfn "result7: %A" result7

    let result8 = sum8 ()
    printfn "result8: %A" result8

    let result9 = sum9 ()
    printfn "result9: %A" result9

    let result10 = sum10 ()
    printfn "result10: %A" result10

    let result11 = sum11 ()
    printfn "result11: %A" result11

    let result12 = sum12 ()
    printfn "result12: %A" result12

    let result13 = sum13 ()
    printfn "result13: %A" result13

    let result14 = sum14 ()
    printfn "result14: %A" result14

    let result15 = sum15 ()
    printfn "result15: %A" result15

    let result16 = sum16 ()
    printfn "result16: %A" result16
    
    printf "Press any key to exit: "
    System.Console.ReadKey() |> ignore
    printfn ""

    0 // return an integer exit code

Additional, new info

EDIT: This thread on Github has Don Syme, creator of F#, specifically mention that:

[...] Second, you are correct, we don't guarantee to optimize uses of f <| x or x |> f or any similar to first-calling tailcalls even if f x is a tailcall.

like image 103
Guy Coder Avatar answered Oct 25 '22 11:10

Guy Coder