Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write effectively to a file in F#?

Tags:

f#

I want to generate large xml files for testing purpose but the code I ended up with is really slow, the time grows exponentially with the number of rows I write to the file. Th example below shows that it takes milliseconds to write 100 rows, but it takes over 20 seconds to write 1000 rows (on my machine). I really can't figured out what is making this slow, since I think that writing 1000 rows shouldn't take that long. Also, writing 200 rows takes about 4 times as long as writing 100 rows which is not good. To run the code you might want to change the path for the StreamWriter.

open System.IO
open System.Diagnostics

let xmlSeq = Seq.initInfinite (fun index -> sprintf "<author><name>name%d</name><age>%d</age><books><book>book%d</book></books></author>" index index index)

let createFile (seq: string seq) numberToTake fileName =
    use streamWriter = new StreamWriter("C:\\tmp\\FSharpXmlTest\\FSharpXmlTest\\" + fileName, false)
    streamWriter.WriteLine("<startTag>")
    let rec internalWriter (seq: string seq) (sw:StreamWriter) i (endTag:string) =
        match i with
        | 0 -> (sw.WriteLine(Seq.head seq);
            sw.WriteLine(endTag))
        | _ -> (sw.WriteLine(Seq.head seq);
            internalWriter (Seq.skip 1 seq) sw (i-1) endTag)
    internalWriter seq streamWriter numberToTake "</startTag>"

let funcTimer fn =
    let stopWatch = Stopwatch.StartNew()
    printfn "Timing started"
    fn()
    stopWatch.Stop()
    printfn "Time elased: %A" stopWatch.Elapsed


(funcTimer (fun () -> createFile xmlSeq 100 "file100.xml"))
(funcTimer (fun () -> createFile xmlSeq 1000 "file1000.xml"))
like image 790
Tomas Jansson Avatar asked Jan 08 '14 08:01

Tomas Jansson


2 Answers

You observed a quadratic behaviour O(n^2) on manipulating sequences. When you call Seq.skip, a brand new sequence will be created, so you implicitly traverse the remaining part. More detailed explanation could be found at https://stackoverflow.com/a/1306267.

In this example, you don't need to decompose sequences. Replacing your inner function by:

let internalWriter (seq: string seq) (sw:StreamWriter) i (endTag:string) =
    for node in Seq.take i seq do
        sw.WriteLine(node)
    sw.WriteLine(endTag)

I can write 10000 rows in fraction of a second.

You can refactor further by remove this inner function and copy its body to the parent function.

As the link above mentioned, if you ever need decomposing sequences, LazyList should be better to use.

like image 94
pad Avatar answered Sep 27 '22 22:09

pad


pad in his answer has pointed to the cause of slowdown. Another idiomatic approach might be instead of infinite sequence generating sequence of needed length with Seq.unfold, which makes the code really trivial:

let xmlSeq n = Seq.unfold (fun i ->
    if i = 0 then None
    else Some((sprintf "<author><name>name%d</name><age>%d</age><books><book>book%d</book></books></author>" i i i), i - 1)) n

let createFile seqLen fileName =
    use streamWriter = new StreamWriter("C:\\tmp\\FSharpXmlTest\\" + fileName, false)
    streamWriter.WriteLine("<startTag>")
    seqLen |> xmlSeq |> Seq.iter streamWriter.WriteLine
    streamWriter.WriteLine("</startTag>")

(funcTimer (fun () -> createFile  10000 "file10000.xml"))

Generating 10000 elements takes around 500ms on my laptop.

like image 25
Gene Belitski Avatar answered Sep 27 '22 22:09

Gene Belitski