Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does deep usage of the stack cause superlinear time behavior for a simple interpreter?

Tags:

.net

memory

f#

type Expr =
    | Lit of int
    | Add of Expr * Expr

let rec intr = function
    | Lit _ as x -> x
    | Add(Lit a,Lit b) -> Lit <| a + b
    | Add(a,b) -> intr <| Add(intr a, intr b)

let rec intr_cps x ret =
    match x with
    | Lit _ as x -> ret x
    | Add(Lit a,Lit b) -> Lit (a + b) |> ret
    | Add(a,b) -> 
        intr_cps a <| fun a ->
            intr_cps b <| fun b ->
                intr_cps (Add(a, b)) ret

let rec add n =
    if n > 1 then Add(Lit 1, add (n-1))
    else Lit 1

open System.Threading

let mem = 1024*1024*512 // ~536mb
// It stack overflows without being spun on a separate thread.
// By default, the program only has a few mb of stack memory at its disposal.
let run f = Thread(ThreadStart f,mem).Start() 

run <| fun _ ->
    let f n =
        let x = add n
        let stopwatch = System.Diagnostics.Stopwatch.StartNew()
        printfn "%A" (intr x)
        printfn "n_%i_std = %A" n stopwatch.Elapsed

        stopwatch.Restart()
        printfn "%A" (intr_cps x id)
        printfn "n_%i_cps = %A" n stopwatch.Elapsed
    f <| 1000*1000/2
    f <| 1000*1000
    f <| 1000*1000*2

//Lit 500000
//n_500000_std = 00:00:00.7764730
//Lit 500000
//n_500000_cps = 00:00:00.0800371
//Lit 1000000
//n_1000000_std = 00:00:02.9531043
//Lit 1000000
//n_1000000_cps = 00:00:00.1941828
//Lit 2000000
//n_2000000_std = 00:00:13.7823780
//Lit 2000000
//n_2000000_cps = 00:00:00.2767752

I have a much bigger interpreter whose performance behavior I am trying to better understand so I made the above. I am definitely sure now that the superlinear time scaling I am seeing in it on some examples is related to the way it uses the stack, but I am not sure why this is happening so I wanted to ask here.

As you can see, as I vary the n by 2x, the time varies much more than that, and it seems like the scaling is exponential which is surprising to me. Also it is surprising that the CPS'd interpreter is faster than the stack based one. Why is that?

I also want to ask if I would see this same behavior if I coded the equivalent of the above in a non .NET language or even C?

like image 550
Marko Grdinić Avatar asked Aug 13 '17 16:08

Marko Grdinić


1 Answers

Looks like most of what you're measuring is building the data structure. Factor out

let data = add n

outside the time measurement (and replace add n with data inside) and the CPS goes linear.

I don't know enough about threads with large stacks and memory performance to explain the rest offhand, and haven't profiled the memory to get any feel.

like image 174
Brian Avatar answered Oct 12 '22 01:10

Brian