F# how to Window a sequence based on predicate rather than fixed length



Given the following input sequence, I would like to generate the desired output. I know that Seq.window can be used to almost get the desired result if all the windows are a fixed length. However in this case they are not fixed legnth, I would like to start a new sequence whenever "a" is encountered. Is this possible with the standard collections library?

let inputSequence = 
      ["a"; "b"; "c";
       "a"; "b"; "c"; "d";
       "a"; "b"; 
       "a"; "d"; "f";
       "a"; "x"; "y"; "z"]

let desiredResult = 
   [["a"; "b"; "c";]
    ["a"; "b"; "c"; "d";]
    ["a"; "b"; ]
    ["a"; "d"; "f";]
    ["a"; "x"; "y"; "z"]]
2 Answers

Here's a way that uses mutable state but is pretty concise:

let mutable i = 0
[ for x in inputSequence do
    if x = "a" then i <- i + 1
    yield i, x ]
|> List.groupBy fst
|> List.map snd
|> List.map (List.map snd)
As mentioned in the other answer, you can fairly easily implement this using recursion or using fold. To make the recursive version more useful, you can define a function chunkAt that creates a new chunk when the list contains a specific value:

let chunkAt start list = 
  let rec loop chunk chunks list = 
    match list with
    | [] -> List.rev ((List.rev chunk)::chunks)
    | x::xs when x = start && List.isEmpty chunk -> loop [x] chunks xs
    | x::xs when x = start -> loop [x] ((List.rev chunk)::chunks) xs
    | x::xs -> loop (x::chunk) chunks xs
  loop [] [] list

Then you can run it on your input sequence using:

chunkAt "a" inputSequence

Although there is no standard library function doing this, you can use the data series manipulation library Deedle, which implements a fairly rich set of chunking functions. To do this using Deedle, you can turn your sequence into a series indexed by ordinal index and then use:

let s = Series.ofValues inputSequence
let chunked = s |> Series.chunkWhile (fun _ k2 -> s.[k2] <> "a")

If you wanted to turn data back to a list, you can use the Values property of the returned series:

chunked.Values |> Seq.map (fun s -> s.Values)
