type Expr =
| Lit of int
| Add of Expr * Expr
let rec intr = function
| Lit _ as x -> x
| Add(Lit a,Lit b) -> Lit <| a + b
| Add(a,b) -> intr <| Add(intr a, intr b)
let rec intr_cps x ret =
match x with
| Lit _ as x -> ret x
| Add(Lit a,Lit b) -> Lit (a + b) |> ret
| Add(a,b) ->
intr_cps a <| fun a ->
intr_cps b <| fun b ->
intr_cps (Add(a, b)) ret
let rec add n =
if n > 1 then Add(Lit 1, add (n-1))
else Lit 1
open System.Threading
let mem = 1024*1024*512 // ~536mb
// It stack overflows without being spun on a separate thread.
// By default, the program only has a few mb of stack memory at its disposal.
let run f = Thread(ThreadStart f,mem).Start()
run <| fun _ ->
let f n =
let x = add n
let stopwatch = System.Diagnostics.Stopwatch.StartNew()
printfn "%A" (intr x)
printfn "n_%i_std = %A" n stopwatch.Elapsed
stopwatch.Restart()
printfn "%A" (intr_cps x id)
printfn "n_%i_cps = %A" n stopwatch.Elapsed
f <| 1000*1000/2
f <| 1000*1000
f <| 1000*1000*2
//Lit 500000
//n_500000_std = 00:00:00.7764730
//Lit 500000
//n_500000_cps = 00:00:00.0800371
//Lit 1000000
//n_1000000_std = 00:00:02.9531043
//Lit 1000000
//n_1000000_cps = 00:00:00.1941828
//Lit 2000000
//n_2000000_std = 00:00:13.7823780
//Lit 2000000
//n_2000000_cps = 00:00:00.2767752
I have a much bigger interpreter whose performance behavior I am trying to better understand so I made the above. I am definitely sure now that the superlinear time scaling I am seeing in it on some examples is related to the way it uses the stack, but I am not sure why this is happening so I wanted to ask here.
As you can see, as I vary the n
by 2x, the time varies much more than that, and it seems like the scaling is exponential which is surprising to me. Also it is surprising that the CPS'd interpreter is faster than the stack based one. Why is that?
I also want to ask if I would see this same behavior if I coded the equivalent of the above in a non .NET language or even C?
Looks like most of what you're measuring is building the data structure. Factor out
let data = add n
outside the time measurement (and replace add n
with data
inside) and the CPS goes linear.
I don't know enough about threads with large stacks and memory performance to explain the rest offhand, and haven't profiled the memory to get any feel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With