Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overhead involved in nested sequences in F#

I need to do excessive pattern matching against an xml structure, so I declared a type to represent the xml nodes. The xml is a multitree and I need to somehow iterate over the nodes. To make the tree enumerable, I use nested sequence comprehensions. My XML will never be too big, so simplicity beats performance in my case, but is the following code somehow dangerous for larger inputs, or would it be ok to leave it as it is.

type ElementInfo = { Tag : string; Attributes : Map<string, string> }
type Contents = 
    | Nothing
    | String of string
    | Elements of Node list
and Node =
    | Element of ElementInfo * Contents
    | Comment of string
    member node.PreOrder = 
        seq {
            match node with
            | Element (_, Elements children) as parent -> yield parent; yield! children |> Seq.collect (fun child -> child.PreOrder)
            | singleNode -> yield singleNode
        }
like image 947
Daniel Fabian Avatar asked Aug 12 '13 15:08

Daniel Fabian


1 Answers

I thought the call to Seq.collect might cause it to generate an IEnumerable per call, so I replaced it with a for loop and it produces the same output (via ILSpy).

That made me curious so I decompiled a simpler sequence expression I'd expect to be "flattened."

let rec infSeq n =
  seq {
    yield n
    yield! infSeq (n+1)
  }

The code to generate the next element in the sequence decompiles to this:

public override int GenerateNext(ref IEnumerable<int> next)
{
    switch (this.pc)
    {
    case 1:
        this.pc = 2;
        next = Test.infSeq(this.n + 1);
        return 2;
    case 2:
        this.pc = 3;
        break;
    case 3:
        break;
    default:
        this.pc = 1;
        this.current = this.n;
        return 1;
    }
    this.current = 0;
    return 0;
}

As you can see, it calls itself recursively, generating a fresh IEnumerable each time. A quick test in FSI

infSeq 0 |> Seq.take 10000000 |> Seq.length

you can see there's a lot of GC:

> Real: 00:00:01.759, CPU: 00:00:01.762, GC gen0: 108, gen1: 107, gen2: 1

Compared to the C# version

public static IEnumerable<int> InfSeq(int n) {
    while (true) yield return n++;
}

in FSI:

> Real: 00:00:00.991, CPU: 00:00:00.998, GC gen0: 0, gen1: 0, gen2: 0

It's faster and uses constant memory (no extra IEnumerables).

I thought F# would generate a single IEnumerable for yield! in a tail position, but apparently not.

EDIT

The spec confirms this: {| yield! expr |} is elaborated as expr, that is, subsequences (recursive or otherwise) are not merged into a single IEnumerable.

like image 173
Daniel Avatar answered Nov 13 '22 01:11

Daniel