Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to eliminate time spent in JIT_TailCall for functions that are genuinely non-recursive

Tags:

.net

f#

I am writing a 64 bit F# solution and profiling has revealed a surprisingly & unexpectedly large amount of time spent in JIT_TailCall...it is in fact dominating the runtime (circa 80%). This appears together with its evil cousin JIT_TailCallHelperStub_ReturnAddress.

I have definitely traced the source to passing a struct type (custom value types) in a method or property call across an assembly boundary. I am certain of this because if I bypass the method call and assign my struct to the property directly (the one the offending method was using) the performance magically improves by a factor of 4-5 x less runtime!

The calling assembly is using F# 3.1 because it is being dynamically compiled with the latest stable release of FSharp.Compiler.Services.

The assembly being called is using F# 4.0 / .NET 4.6 (VS 2015).

UPDATE

A simplification of what I am trying to do is to assign a custom struct value to a position in an array from a dynamically generated assembly...

Runtime is fast and no extraneous tail calls are generated when calling:

  1. A property exposing the private array in the type

However, the runtime is slow due to extraneous tail calls being generated when calling:

  1. An indexer property exposing the array (Item)

  2. A member method acting as a setter for the array

The reason I need to call the member method is that I need to perform a few checks prior to insertion of the item in the array.

PRACTICAL

Over and above understanding the root of the issue, I would like to know whether F# 4.0 and by implication the coming release of FSharp.Compiler.Services would solve this issue. Given that the updated FSharp.Compiler.Services is relatively imminent, it may then just be best to wait.

like image 643
Sam Avatar asked Jul 15 '15 14:07

Sam


1 Answers

I posted this on your GitHub question, but cross-post it here so it is easier to find:

I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%.

I haven't profiled this snippet, but this is how my real code is structured:

#time "on"
type MyRecType() = 

  let list = System.Collections.Generic.List()

  member this.DoWork() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1

    let rec outerLoop() =
      if local < 1000000 then
        innerLoop(1)
      else
        tcs.SetResult(local)
        ()

    and innerLoop(inc:int) =
      if local % 2 = 0 then
        local <- local + inc
        outerLoop()
      else
        list.Add(local) // just fake access to a field to illustrate the pattern
        local <- local + 1
        innerLoop(inc)

    outerLoop()

    returnTask


let instance = MyRecType()

instance.DoWork().Result

> Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

.NET 4.6 and F# 4.0 don't help at all.

I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation?

Update Rewriting the method as:

  member this.DoWork2() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1
    let rec loop(isOuter:bool, inc:int) =
      if isOuter then
        if local < 1000000 then
          loop(false,1)
        else
          tcs.SetResult(local)
          ()
      else
        if local % 2 = 0 then
          local <- local + inc
          loop(true,1)
        else
          list.Add(local) // just fake access to a field to illustrate the pattern
          local <- local + 1
          loop(false,1)

    loop(true,1)

    returnTask


> Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario.

like image 199
V.B. Avatar answered Oct 27 '22 23:10

V.B.