I am writing a 64 bit F# solution and profiling has revealed a surprisingly & unexpectedly large amount of time spent in JIT_TailCall
...it is in fact dominating the runtime (circa 80%). This appears together with its evil cousin JIT_TailCallHelperStub_ReturnAddress
.
I have definitely traced the source to passing a struct
type (custom value types) in a method or property call across an assembly boundary. I am certain of this because if I bypass the method call and assign my struct
to the property directly (the one the offending method was using) the performance magically improves by a factor of 4-5 x less runtime!
The calling assembly is using F# 3.1 because it is being dynamically compiled with the latest stable release of FSharp.Compiler.Services.
The assembly being called is using F# 4.0 / .NET 4.6 (VS 2015).
A simplification of what I am trying to do is to assign a custom struct
value to a position in an array from a dynamically generated assembly...
Runtime is fast and no extraneous tail calls are generated when calling:
However, the runtime is slow due to extraneous tail calls being generated when calling:
An indexer property exposing the array (Item)
A member method acting as a setter for the array
The reason I need to call the member method is that I need to perform a few checks prior to insertion of the item in the array.
Over and above understanding the root of the issue, I would like to know whether F# 4.0 and by implication the coming release of FSharp.Compiler.Services would solve this issue. Given that the updated FSharp.Compiler.Services is relatively imminent, it may then just be best to wait.
I posted this on your GitHub question, but cross-post it here so it is easier to find:
I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%.
I haven't profiled this snippet, but this is how my real code is structured:
#time "on"
type MyRecType() =
let list = System.Collections.Generic.List()
member this.DoWork() =
let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
let returnTask = tcs.Task // NB! must access this property first
let mutable local = 1
let rec outerLoop() =
if local < 1000000 then
innerLoop(1)
else
tcs.SetResult(local)
()
and innerLoop(inc:int) =
if local % 2 = 0 then
local <- local + inc
outerLoop()
else
list.Add(local) // just fake access to a field to illustrate the pattern
local <- local + 1
innerLoop(inc)
outerLoop()
returnTask
let instance = MyRecType()
instance.DoWork().Result
> Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001
.NET 4.6 and F# 4.0 don't help at all.
I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation?
Update Rewriting the method as:
member this.DoWork2() =
let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
let returnTask = tcs.Task // NB! must access this property first
let mutable local = 1
let rec loop(isOuter:bool, inc:int) =
if isOuter then
if local < 1000000 then
loop(false,1)
else
tcs.SetResult(local)
()
else
if local % 2 = 0 then
local <- local + inc
loop(true,1)
else
list.Add(local) // just fake access to a field to illustrate the pattern
local <- local + 1
loop(false,1)
loop(true,1)
returnTask
> Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001
reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With