Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write efficient list/seq functions in F#? (mapFoldWhile)

Tags:

f#

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state.

I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (i.e. None) is found.

This was myfirst attempt:

let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
  let rec mapRec f state list results =
    match list with 
    | [] -> (List.rev results, state)
    | item :: tail -> 
      let (result, newState) = f state item
      match newState with 
      | Some x -> mapRec f newState tail (result :: results)
      | None -> ([], None)
  mapRec f state list []

The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower.

So I looked up what F#'s very own map does, which was:

let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list

The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this:

let map f x = 
    match x with
    | [] -> []
    | [h] -> [f h]
    | (h::t) -> 
        let cons = freshConsNoTail (f h)
        mapToFreshConsTail cons f t
        cons

The consNoTail stuff is also in this file:

// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)

So I guess it turns out that F#'s immutable lists are actually mutable because performance? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#.

I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead?

I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more (mapFold).

Is there an efficient solution to this kind of problem (mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List?

like image 304
enzi Avatar asked Jun 27 '16 16:06

enzi


People also ask

How to create a list in f#?

F# Lists Creating lists A way to create a list is to place elements in two square brackets, separated by semicolons. The elements must have the same type. It is also possible to define lists of functions, of elements of a type defined previously, of objects of a class, etc.

What is SEQ in F#?

Seq. groupBy takes a sequence and a function that generates a key from an element. The function is executed on each element of the sequence. Seq. groupBy returns a sequence of tuples, where the first element of each tuple is the key and the second is a sequence of elements that produce that key.

What is the underlying difference between a sequence and a list in F #?

The list is created on declaration, but elements in the sequence are created as they are needed. As a result, sequences are able to represent a data structure with an arbitrary number of elements: > seq { 1I ..

What is yield in F#?

F# Sequence Workflows yield and yield! (pronounced yield bang) inserts all the items of another sequence into this sequence being built. Or, in other words, it appends a sequence. (In relation to monads, it is bind .)


1 Answers

In most cases, using List.rev is a perfectly sufficient solution.

You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations.

Running your function with the following:

let l = [ 1 .. 1000000 ]

#time 
mapFoldWhile (fun s v -> 0, s) (Some 1) l

I get ~240ms on the second line when I run the function without changes. When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it.

For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile. For example, say you want to sum numbers from a sequence until you reach 1000. You can write:

input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)

Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens.

like image 169
Tomas Petricek Avatar answered Sep 19 '22 00:09

Tomas Petricek