performance of static member constraint functions

Question

I'm trying to learn static member constraints in F#. From reading Tomas Petricek's blog post, I understand that writing an inline function that "uses only operations that are themselves written using static member constraints" will make my function work correctly for all numeric types that satisfy those constraints. This question indicates that inline works somewhat similarly to c++ templates, so I wasn't expecting any performance difference between these two functions:

let MultiplyTyped (A : double[,]) (B : double[,]) =
    let rA, cA = (Array2D.length1 A) - 1, (Array2D.length2 A) - 1
    let cB = (Array2D.length2 B) - 1
    let C = Array2D.zeroCreate<double> (Array2D.length1 A) (Array2D.length2 B)
    for i = 0 to rA do
        for k = 0 to cA do
            for j = 0 to cB do
                C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]
    C

let inline MultiplyGeneric (A : 'T[,]) (B : 'T[,]) =
    let rA, cA = Array2D.length1 A - 1, Array2D.length2 A - 1
    let cB = Array2D.length2 B - 1
    let C = Array2D.zeroCreate<'T> (Array2D.length1 A) (Array2D.length2 B)
    for i = 0 to rA do
        for k = 0 to cA do
            for j = 0 to cB do
                C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]
    C

Nevertheless, to multiply two 1024 x 1024 matrixes, MultiplyTyped completes in an average of 2550 ms on my machine, whereas MultiplyGeneric takes about 5150 ms. I originally thought that zeroCreate was at fault in the generic version, but changing that line to the one below didn't make a difference.

let C = Array2D.init<'T> (Array2D.length1 A) (Array2D.length2 B) (fun i j -> LanguagePrimitives.GenericZero)

Is there something I'm missing here to make MultiplyGeneric perform the same as MultiplyTyped? Or is this expected?

edit: I should mention that this is VS2010, F# 2.0, Win7 64bit, release build. Platform target is x64 (to test larger matrices) - this makes a difference: x86 produces similar results for the two functions.

Bonus question: the type inferred for MultiplyGeneric is the following:

val inline MultiplyGeneric :
   ^T [,] ->  ^T [,] ->  ^T [,]
    when ( ^T or  ^a) : (static member ( + ) :  ^T *  ^a ->  ^T) and
          ^T : (static member ( * ) :  ^T *  ^T ->  ^a)

Where does the ^a type come from?

edit 2: here's my testing code:

let r = new System.Random()
let A = Array2D.init 1024 1024 (fun i j -> r.NextDouble())
let B = Array2D.init 1024 1024 (fun i j -> r.NextDouble())

let test f =
    let sw = System.Diagnostics.Stopwatch.StartNew()
    f() |> ignore
    sw.Stop()
    printfn "%A" sw.ElapsedMilliseconds

for i = 1 to 5 do
    test (fun () -> MultiplyTyped A B)

for i = 1 to 5 do
    test (fun () -> MultiplyGeneric A B)

kvb · Accepted Answer

Good question. I'll answer the easy part first: the ^a is just part of the natural generalization process. Imagine you had a type like this:

type T = | T with
    static member (+)(T, i:int) = T
    static member (*)(T, T) = 0

Then you can still use your MultiplyGeneric function with arrays of this type: multiplying elements of A and B will give you ints, but that's okay because you can still add them to elements of C and get back values of type T to store back into C.

As to your performance question, I'm afraid I don't have a great explanation. Your basic understanding is right - using MultiplyGeneric with double[,] arguments should be equivalent to using MultiplyTyped. If you use ildasm to look at the IL the compiler generates for the following F# code:

let arr = Array2D.zeroCreate 1024 1024
let f1 = MultiplyTyped arr
let f2 = MultiplyGeneric arr

let timer = System.Diagnostics.Stopwatch()
timer.Start()

f1 arr |> ignore

printfn "%A" timer.Elapsed
timer.Restart()

f2 arr |> ignore

printfn "%A" timer.Elapsed

then you can see that the compiler really does generate identical code for each of them, putting the inlined code for MultipyGeneric into an internal static function. The only difference that I see in the generated code is in the names of locals, and when running from the command line I get roughly equal elapsed times. However, running from FSI I see a difference similar to what you've reported.

It's not clear to me why this would be. As I see it there are two possibilities:

FSI's code generation may be doing something slightly different than the static compiler
The CLR's JIT compiler may be treat code generated at runtime slightly differently from compiled code. For instance, as I mentioned my code above using MultiplyGeneric actually results in an internal method that contains the inlined body. Perhaps the CLR's JIT handles the difference between public and internal methods differently when they are generated at runtime than when they are in statically compiled code.

Daniel · Answer

I'd like to see your benchmarks. I don't get the same results (VS 2012 F# 3.0 Win 7 64-bit).

let m = Array2D.init 1024 1024 (fun i j -> float i * float j)

let test f =
  let sw = System.Diagnostics.Stopwatch.StartNew()
  f() |> ignore
  sw.Stop()
  printfn "%A" sw.Elapsed

test (fun () -> MultiplyTyped m m)
> 00:00:09.6013188

test (fun () -> MultiplyGeneric m m)
> 00:00:09.1686885

Decompiling with Reflector, the functions look identical.

Regarding your last question, the least restrictive constraint is inferred. In this line

C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]

because the result type of A.[i,k] * B.[k,j] is unspecified, and is passed immediately to (+), an extra type could be involved. If you want to tighten the constraint you can replace that line with

let temp : 'T = A.[i,k] * B.[k,j]
C.[i,j] <- C.[i,j] + temp

That will change the signature to

val inline MultiplyGeneric :
  A: ^T [,] -> B: ^T [,] ->  ^T [,]
    when  ^T : (static member ( * ) :  ^T *  ^T ->  ^T) and
          ^T : (static member ( + ) :  ^T *  ^T ->  ^T)

EDIT

Using your test, here's the output:

//MultiplyTyped
00:00:09.9904615
00:00:09.5489653
00:00:10.0562346
00:00:09.7023183
00:00:09.5123992
//MultiplyGeneric
00:00:09.1320273
00:00:08.8195283
00:00:08.8523408
00:00:09.2496603
00:00:09.2950196

Here's the same test on ideone (with a few minor changes to stay within the time limit: 512x512 matrix and one test iteration). It runs F# 2.0 and produced similar results.

performance of static member constraint functions

Tags:

performance

generics

f#

vlad

2 Answers

kvb

Daniel

Recent Activity

Donate For Us

performance of static member constraint functions

Tags:

performance

generics

f#

vlad

2 Answers

kvb

Daniel

Related questions

Recent Activity

Donate For Us