Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why Seq.iter is 2x faster than for loop if target is for x64?

Tags:

f#

Disclaim: This is micro-benchmark, please do not comment quotes such as "premature optimization is evil" if you feel unhappy about the topic.

Examples are release targeted for x64, .Net4.5 Visual Studio 2012 F# 3.0 and run in windows 7 x64

After profiling, I narrowed down the bottleneck of one of my applications, so that I want to raise this question:

Observation

If there is no loop inside for in loop or Seq.iter, then it is clear they are both of similar speed. (update2 vs update4)

If there is a loop inside for in loop or Seq.iter, it seems Seq.iter is 2x as faster as for in. (update vs update3) strange? (if run in fsi they would be similar)

If it is targeted for anycpu and run in x64, there is no difference in time. So the question becomes: Seq.iter (update3) would boost up 2x speed if target is x64

Time taken:

update:   00:00:11.4250483 // 2x as much as update3, why?
updatae2: 00:00:01.4447233
updatae3: 00:00:06.0863791
updatae4: 00:00:01.4939535

Source Code:

open System.Diagnostics
open System

[<EntryPoint>]
let main argv = 
    let pool = seq {1 .. 1000000}

    let ret = Array.zeroCreate 100

    let update pool =
        for x in pool do
            for y in 1 .. 200 do
                ret.[2] <- x + y

    let update2 pool =
        for x in pool do
            //for y in 1 .. 100 do
                ret.[2] <- x


    let update3 pool =
        pool
            |> Seq.iter (fun x ->
                                  for y in 1 .. 200 do
                                      ret.[2] <- x + y)

    let update4 pool =
        pool
            |> Seq.iter (fun x ->
                                  //for y in 1 .. 100 do
                                      ret.[2] <- x)


    let test n =
        let run = match n with
                  | 1 -> update
                  | 2 -> update2
                  | 3 -> update3
                  | 4 -> update4
        for i in 1 .. 50 do
            run pool

    let sw = new Stopwatch()
    sw.Start()
    test(1)
    sw.Stop()
    Console.WriteLine(sw.Elapsed);

    sw.Restart()
    test(2)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)

    sw.Restart()
    test(3)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)

    sw.Restart()
    test(4)
    sw.Stop()
    Console.WriteLine(sw.Elapsed)
    0 // return an integer exit code
like image 702
colinfang Avatar asked Nov 03 '12 13:11

colinfang


1 Answers

This isn't a complete answer, but hope it helps you to go further.

I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:

open System

let test1() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    for x in pool do
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y

let test2() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    Seq.iter (fun x -> 
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y) pool

let time f =
    let sw = new Diagnostics.Stopwatch()
    sw.Start()
    let result = f() 
    sw.Stop()
    Console.WriteLine(sw.Elapsed)
    result

[<EntryPoint>]
let main argv =
    time test1
    time test2
    0

In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:

00:00:06.9264843
00:00:03.6834886

Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:

let body (ret: _ []) x =
    for _ in 1..50 do
        for y in 1..200 do
            ret.[2] <- x + y

let test3() =
    let ret = Array.zeroCreate 100
    let pool = {1..1000000}    
    for x in pool do
        body ret x

// 00:00:03.7012302

When I disable JIT optimization using the technique described here, execution times of these functions are comparable.

Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.

like image 176
pad Avatar answered Sep 25 '22 01:09

pad