Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve performance with F# idioms

Tags:

performance

f#

I'm using this course on Machine-Learning to learn F# at the same time. I've done the following homework exercise which is the first exercise of the second week:

Run a computer simulation for flipping 1,000 virtual fair coins. Flip each coin independently 10 times. Focus on 3 coins as follows: c1 is the first coin flipped, crand is a coin chosen randomly from the 1,000, and cmin is the coin which had the minimum frequency of heads (pick the earlier one in case of a tie).

Let ν1 , νrand , and νmin be the fraction of heads obtained for the 3 respective coins out of the 10 tosses. Run the experiment 100,000 times in order to get a full distribution of ν1 , νrand, and νmin (note that c rand and c min will change from run to run).

What is the average value of νmin?

I have produced the following code, which works fine and gives the correct answer:

let private rnd = System.Random()
let FlipCoin() = rnd.NextDouble() > 0.5
let FlipCoinNTimes N = List.init N (fun _ -> FlipCoin())
let FlipMCoinsNTimes M N = List.init M (fun _ -> FlipCoinNTimes N)

let ObtainFrequencyOfHeads tosses = 
    let heads = tosses |> List.filter (fun toss -> toss = true)
    float (List.length (heads)) / float (List.length (tosses))

let GetFirstRandMinHeadsFraction allCoinsLaunchs = 
    let first = ObtainFrequencyOfHeads(List.head (allCoinsLaunchs))
    let randomCoin = List.item (rnd.Next(List.length (allCoinsLaunchs))) allCoinsLaunchs
    let random = ObtainFrequencyOfHeads(randomCoin)

    let min = 
        allCoinsLaunchs
        |> List.map (fun coin -> ObtainFrequencyOfHeads coin)
        |> List.min
    (first, random, min)

module Exercice1 = 
    let GetResult() = 
        Seq.init 100000 (fun _ -> FlipMCoinsNTimes 1000 10)
        |> Seq.map (fun oneExperiment -> GetFirstRandMinHeadsFraction oneExperiment)
        |> Seq.map (fun (first, random, min) -> min)
        |> Seq.average

However, it takes roughly 4 minutes to run in my machine. I know that it is doing a lot of work, but I'm wondering if there are some modifications that could be made to optimize it.

As I'm trying lo learn F#, I'm asking for optimizations that use F# idioms, not to change the code to a C-style.

Feel free to suggest any kind of improvement, in style, good practices, etc.

[UPDATE]

I have written some code to compare the proposed solutions, it is accesible here.

These are the results:

Base - result: 0.037510, time elapsed: 00:00:55.1274883, improvement: 0.99 x

Matthew Mcveigh - result: 0.037497, time elapsed: 00:00:15.1682052, improvement: 3.61 x

Fyodor Soikin - result:0.037524, time elapsed: 00:01:29.7168787, improvement: 0.61 x

GuyCoder - result: 0.037645, time elapsed: 00:00:02.0883482, improvement: 26.25 x

GuyCoder MathNet- result: 0.037666, time elapsed: 00:00:24.7596117, improvement: 2.21 x

TheQuickBrownFox - result: 0.037494, time elapsed: 00:00:34.2831239, improvement: 1.60 x

The winner concerning the improvement in time is the GuyCoder, so I will accept his answer. However, I find that his code is more difficult to understand.

like image 743
Tao Gómez Gil Avatar asked Jun 04 '16 17:06

Tao Gómez Gil


2 Answers

Allocating a large amount of lists up front is heavy work, the algorithm can be processed online e.g. via sequences or recursion. I transformed all the work into tail recursive functions for some raw speed (will be transformed into loops by the compiler)

not guaranteed to be 100% correct, but hopefully gives you a gist of where I was going with it:

let private rnd = System.Random()
let flipCoin () = rnd.NextDouble() > 0.5

let frequencyOfHeads flipsPerCoin = 
    let rec countHeads numHeads i =
        if i < flipsPerCoin then
            let isHead = flipCoin ()
            countHeads (if isHead then numHeads + 1 else numHeads) (i + 1)
        else
            float numHeads

    countHeads 0 0 / float flipsPerCoin

let getFirstRandMinHeadsFraction numCoins flipsPerCoin = 
    let randomCoinI = rnd.Next numCoins

    let rec run first random min i =
        if i < numCoins then
            let frequency = frequencyOfHeads flipsPerCoin
            let first = if i = 0 then frequency else first
            let random = if i = randomCoinI then frequency else random
            let min = if min > frequency then frequency else min

            run first random min (i + 1)
        else
            (first, random, min)

    run 0.0 0.0 System.Double.MaxValue 0

module Exercice1 = 
    let getResult () = 
        let iterations, numCoins, numFlips = 100000, 1000, 10

        let getMinFromExperiment () =
            let (_, _, min) = getFirstRandMinHeadsFraction numCoins numFlips
            min

        let rec sumMinFromExperiments i sumOfMin =
            if i < iterations then
                sumMinFromExperiments (i + 1) (sumOfMin + getMinFromExperiment ())
            else
                sumOfMin

        let sum = sumMinFromExperiments 0 0.0
        sum / float iterations
like image 188
Matthew Mcveigh Avatar answered Oct 21 '22 11:10

Matthew Mcveigh


Running your code on my computer and timing I get:

seconds: 68.481918
result: 0.47570994

Running my code on my computer and timing I get:

seconds: 14.003861
vOne: 0.498963
vRnd: 0.499793
vMin: 0.037675

with vMin being closest to the correct answer of b being 0.01

That is almost 5x faster.

I did not tinker with each method and data structure to figure out why and what worked, I just used many decades of experience to guide me. Clearly not storing the intermediate values but just the results is a big improvement. Specifically coinTest just returns the number of heads which is an int and not a list of the results. Also instead of getting a random number for each coin flip but getting a random number for each coin and then using each part of that random number as a coin flip is advantageous. That saves number of flips - 1 calls to a function. Also I avoided using float values until the very end; I don't consider that saving time on the CPU, but it did simplify the thought process of thinking only in int which allowed me to concentrate on other efficiencies. I know that may sound weird but the less I have to think about the better the answers I get. I also only ran coinTest when it was necessary, e.g. only the first coin, only the random coin, and looked for all tails as an exit condition.

namespace Workspace

module main =

    [<EntryPoint>]
    let main argv = 

        let rnd = System.Random()
        let randomPick (limit : int) : int = rnd.Next(limit)   // [0 .. limit) it's a Python habit

        let numberOfCoins = 1000
        let numberOfFlips = 10
        let numberOfExperiements = 100000

        let coinTest (numberOfFlips : int) : int =
            let rec countHeads (flips : int) bitIndex (headCount : int) : int =
                if bitIndex < 0 then headCount
                else countHeads (flips >>> 1) (bitIndex-1) (headCount + (flips &&& 0x01))
            countHeads (randomPick ((pown 2 numberOfFlips) - 1)) numberOfFlips 0

        let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =
            let (randomCoin : int) = randomPick numberOfCoins
            let rec testCoin coinIndex (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) : (int * int * int) =
                if (coinIndex < numberOfCoins) then
                    if (not cFirstDone || not cRanDone || not cMinDone) then
                        if (cFirstDone && cMinDone && (coinIndex <> randomCoin)) then
                             testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
                        else
                            let headsTotal = coinTest numberOfFlips 
                            let (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone) =
                                let cFirst = if coinIndex = 0 then headsTotal else cFirst
                                let cRnd = if coinIndex = randomCoin then headsTotal else cRnd
                                let cMin = if headsTotal < cMin then headsTotal else cMin
                                let cRanDone = if (coinIndex >= randomCoin) then true else cRanDone
                                let cMinDone = if (headsTotal = 0) then true else cMinDone
                                (cFirst, cRnd, cMin, true, cRanDone, cMinDone)
                            testCoin (coinIndex+1) (cFirst, cRnd, cMin, cFirstDone, cRanDone, cMinDone)
                    else
                        (cFirst, cRnd, cMin)
                else
                    (cFirst, cRnd, cMin)
            testCoin 0 (-1,-1,10, false, false, false)

        let runExperiements (numberOfExperiements : int) (numberOfCoins : int) ( numberOfFlips : int) =
            let rec accumateExperiements index aOne aRnd aMin : (int * int * int) =
                let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
                if index > numberOfExperiements then (aOne, aRnd, aMin)
                else accumateExperiements (index + 1) (aOne + cOne) (aRnd + cRnd) (aMin + cMin)
            let (aOne, aRnd, aMin) = accumateExperiements 0 0 0 0
            let (vOne : double) = (double)(aOne) / (double)numberOfExperiements / (double)numberOfFlips
            let (vRnd : double) = (double)(aRnd) / (double)numberOfExperiements / (double)numberOfFlips
            let (vMin : double) = (double)(aMin) / (double)numberOfExperiements / (double)numberOfFlips
            (vOne, vRnd, vMin)

        let timeIt () = 
            let stopWatch = System.Diagnostics.Stopwatch.StartNew()
            let (vOne, vRnd, vMin) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
            stopWatch.Stop()
            printfn "seconds: %f" (stopWatch.Elapsed.TotalMilliseconds / 1000.0)
            printfn "vOne: %A" vOne
            printfn "vRnd: %A" vRnd
            printfn "vMin: %A" vMin

        timeIt ()

        printf "Press any key to exit: "
        System.Console.ReadKey() |> ignore
        printfn ""

        0 // return an integer exit code

========================================================================

This is just an intermediate answer because I inquired if the OP considered using MathNet Numerics idiomatic F# and the OP wanted to see what that looked like. After running his version and this first cut version on my machine the OP version is faster. OP: 75 secs, mine: 84 secs

namespace Workspace

open MathNet.Numerics.LinearAlgebra

module main =

    [<EntryPoint>]
    let main argv = 

        let rnd = System.Random()
        let flipCoin() = 
            let head = rnd.NextDouble() > 0.5
            if head then 1.0 else 0.0

        let numberOfCoins = 1000
        let numberOfFlips = 10
        let numberOfExperiements = 100000
        let numberOfValues = 3

        let randomPick (limit : int) : int = rnd.Next(limit)   // [0 .. limit) it's a Python habit
        let headCount (m : Matrix<float>) (coinIndex : int) : int = 
            System.Convert.ToInt32((m.Row coinIndex).Sum())

        let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
            let rec findMinHeads currentCoinIndex minHeadsCount minHeadsIndex =
                match currentCoinIndex,minHeadsCount with
                | -1,_ -> minHeadsCount
                | _,0 -> minHeadsCount  // Can't get less than zero so stop searching.
                | _ ->
                    let currentMinHeadCount = (headCount m currentCoinIndex)
                    let nextIndex = currentCoinIndex - 1
                    if currentMinHeadCount < minHeadsCount 
                    then findMinHeads nextIndex currentMinHeadCount currentCoinIndex
                    else findMinHeads nextIndex minHeadsCount minHeadsIndex
            findMinHeads (numberOfCoins - 1) numberOfFlips -1

        // Return the values for cOne, cRnd, and cMin as int values. 
        // Will do division on final sum of experiments instead of after each experiment.
        let runExperiement (numberOfCoins : int) (numberOfFlips : int) : (int * int * int) =        
            let (flips : Matrix<float>) = DenseMatrix.init numberOfCoins numberOfFlips (fun i j -> flipCoin())
            let cOne = headCount flips 0
            let cRnd = headCount flips (randomPick numberOfCoins)
            let cMin = minHeads flips numberOfCoins numberOfFlips
            (cOne,cRnd,cMin)

        let runExperiements (numberOfExperiements : int) (numberOfCoins : int) (numberOfFlips : int) : (int [] * int [] * int []) =
            let (cOneArray : int[]) = Array.create numberOfExperiements 0
            let (cRndArray : int[]) = Array.create numberOfExperiements 0
            let (cMinArray : int[]) = Array.create numberOfExperiements 0
            for i = 0 to (numberOfExperiements - 1) do
                let (cOne,cRnd,cMin) = runExperiement numberOfCoins numberOfFlips
                cOneArray.[i] <- cOne 
                cRndArray.[i] <- cRnd 
                cMinArray.[i] <- cMin 
            (cOneArray, cRndArray, cMinArray)

        let (cOneArray, cRndArray, cMinArray) = runExperiements numberOfExperiements numberOfCoins numberOfFlips
        let (vOne : double) = (double)(Array.sum cOneArray) / (double)numberOfExperiements / (double)numberOfFlips
        let (vRnd : double) = (double)(Array.sum cRndArray) / (double)numberOfExperiements / (double)numberOfFlips
        let (vMin : double) = (double)(Array.sum cMinArray) / (double)numberOfExperiements / (double)numberOfFlips

        printfn "vOne: %A" vOne
        printfn "vRnd: %A" vRnd
        printfn "vMin: %A" vMin

Halfway through the coding I realized I could do all of the calculations using just int, it was only the last calculations that generated the percentages that needed to be a float or double and even then that is only because the list of answers is a percentage; in theory the numbers can be compared as int to get the same understanding. If I use only int then I would have to create an int Matrix type and that is more work than I want to do. When I get time I will switch the MathNet Matrix to an F# Array2D or something similar and check that. Note if you tag this with MathNet then the maintainer of MathNet might answer (Christoph Rüegg)

I made an change to this method and it is faster by 5 seconds.

// faster
let minHeads (m : Matrix<float>) (numberOfCoins : int) (numberOfFlips : int) : int =
    let (mins : float[]) = m.FoldByRow((fun (x : float) y -> x + y), 0.0)
    let (minHead : float) = Array.min mins
    System.Convert.ToInt32(minHead)
like image 28
Guy Coder Avatar answered Oct 21 '22 11:10

Guy Coder