Why are some functions in the Seq module optimized whilst others were not in F#?

Question

This is a follow up to my previous question regarding the Seq module's iter and map functions being much slower compared to the Array and List module equivalents.

Looking at the source, I can see that some functions such as isEmpty and length performs a very simple type check to optimize for arrays and lists before resorting to using IEnumerator.

[<CompiledName("IsEmpty")>]
let isEmpty (source : seq<'T>)  = 
    checkNonNull "source" source
    match source with 
    | :? ('T[]) as a -> a.Length = 0
    | :? list<'T> as a -> a.IsEmpty
    | :? ICollection<'T> as a -> a.Count = 0
    | _ -> 
        use ie = source.GetEnumerator()
        not (ie.MoveNext())

[<CompiledName("Length")>]
let length (source : seq<'T>)    = 
    checkNonNull "source" source
    match source with 
    | :? ('T[]) as a -> a.Length
    | :? ('T list) as a -> a.Length
    | :? ICollection<'T> as a -> a.Count
    | _ -> 
        use e = source.GetEnumerator() 
        let mutable state = 0 
        while e.MoveNext() do
            state <-  state + 1;
        state

In the case of the iter the same approach can be done to vastly improve its performance, when I shadowed the iter function it presented significant gains over the built-in version:

[<CompiledName("Iterate")>]
let iter f (source : seq<'T>) = 
    checkNonNull "source" source
    use e = source.GetEnumerator()
    while e.MoveNext() do
        f e.Current;

My question is, given that some of the functions in the Seq module were optimized for use with specific collection types (arrays, list< T>, etc.) how come other functions such as iter and nth were not optimized in a similar way?

Also, in the case of map function, as @mausch pointed out, is it not possible to employ a similar approach to Enumerable.Select (see below) and build up specialized iterators for different collection types?

public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
      if (source == null)
        throw Error.ArgumentNull("source");
      if (selector == null)
        throw Error.ArgumentNull("selector");
      if (source is Enumerable.Iterator<TSource>)
        return ((Enumerable.Iterator<TSource>) source).Select<TResult>(selector);
      if (source is TSource[])
        return (IEnumerable<TResult>) new Enumerable.WhereSelectArrayIterator<TSource, TResult>((TSource[]) source, (Func<TSource, bool>) null, selector);
      if (source is List<TSource>)
        return (IEnumerable<TResult>) new Enumerable.WhereSelectListIterator<TSource, TResult>((List<TSource>) source, (Func<TSource, bool>) null, selector);
      else
        return (IEnumerable<TResult>) new Enumerable.WhereSelectEnumerableIterator<TSource, TResult>(source, (Func<TSource, bool>) null, selector);
    }

Many thanks in advance.

Guvante · Accepted Answer

In the case of the iter the same approach can be done to vastly improve its performance

I think this is where the answer to your question is. Your test is artificial, and doesn't actually test any real world examples of these methods. You tested 10,000,000 iterations of these methods in order to get differences in timing in ms.

Converted back to per item costs, here they are:

          Array   List
Seq.iter   4 ns    7 ns
Seq.map   20 ns   91 ns

These methods are typically used once per collection, meaning this cost is an additional linear factor to your algorithms performance. In the worst case you are losing less than 100 ns per item in a list (which you shouldn't be using if you care about performance that much).

Contrast this with the case of length which is always linear in the general case. By adding this optimization you provide enormous benefit to someone who forgot to manually cache the length but luckily is always given a list.

Similarly you may call isEmpty many times, and adding another object creation is silly if you can just ask directly. (This one isn't as strong an argument)

Another thing to keep in mind is that neither of those methods actually looks at more than one element of the output. What would you expect the following code to do (excluding syntax errors or missing methods)

type Custom() =
  interface IEnumerable with
    member x.GetEnumerator() =
      return seq {
        yield 1
        yield 2
      }
  interface IList with
    member x.Item with
      get(index) = index
    member x.Count = 12

 let a = Custom()
 a |> Seq.iter (v -> printfn (v.ToString()))

Why are some functions in the Seq module optimized whilst others were not in F#?

Tags:

performance

collections

f#

theburningmonk

1 Answers

Guvante

Recent Activity

Donate For Us

Why are some functions in the Seq module optimized whilst others were not in F#?

Tags:

performance

collections

f#

theburningmonk

1 Answers

Guvante

Related questions

Recent Activity

Donate For Us