Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In F#, How do I use Seq.unfold in the context of a larger pipeline?

Tags:

f#

seq.unfold

I have a CSV file with two columns, text and count. The goal is to transform the file from this:

some text once,1
some text twice,2
some text thrice,3

To this:

some text once,1
some text twice,1
some text twice,1
some text thrice,1
some text thrice,1
some text thrice,1

repeating each line count times and spreading the count over that many lines.

This seems to me like a good candidate for Seq.unfold, generating the additional lines, as we read the file. I have the following generator function:

let expandRows (text:string, number:int32) =
    if number = 0 
    then None
    else
        let element = text                  // "element" will be in the generated sequence
        let nextState = (element, number-1) // threaded state replacing looping 
        Some (element, nextState)

FSI yields a the following function signature:

val expandRows : text:string * number:int32 -> (string * (string * int32)) option

Executing the following in FSI:

let expandedRows = Seq.unfold expandRows ("some text thrice", 3)

yields the expected:

val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"]

The question is: how do I plug this into the context of a larger ETL pipeline? For example:

File.ReadLines(inFile)                  
    |> Seq.map createTupleWithCount
    |> Seq.unfold expandRows // type mismatch here
    |> Seq.iter outFile.WriteLine

The error below is on expandRows in the context of the pipeline.

Type mismatch. 
Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option'    
but given a     'string * int32 -> (string * (string * int32)) option' 
The type    'seq<string * int 32>' does not match the type 'string * int32'

I was expecting that expandRows was returning seq of string, as in my isolated test. As that is neither the "Expecting" or the "given", I'm confused. Can someone point me in the right direction?

A gist for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498

like image 591
akucheck Avatar asked Dec 29 '16 06:12

akucheck


People also ask

What temperature in Fahrenheit is 50 C?

Answer: 50° Celsius is equal to 122° Fahrenheit.


2 Answers

Seq.map produces a sequence, but Seq.unfold does not take a sequence, it takes a single value. So you can't directly pipe the output of Seq.map into Seq.unfold. You need to do it element by element instead.

But then, for each element your Seq.unfold will produce a sequence, so the ultimate result will be a sequence of sequences. You can collect all those "subsequences" in a single sequence with Seq.collect:

File.ReadLines(inFile) 
    |> Seq.map createTupleWithCount 
    |> Seq.collect (Seq.unfold expandRows)
    |> Seq.iter outFile.WriteLine

Seq.collect takes a function and an input sequence. For every element of the input sequence, the function is supposed to produce another sequence, and Seq.collect will concatenate all those sequences in one. You may think of Seq.collect as Seq.map and Seq.concat combined in one function. Also, if you're coming from C#, Seq.collect is called SelectMany over there.

like image 120
Fyodor Soikin Avatar answered Jan 03 '23 21:01

Fyodor Soikin


In this case, since you simply want to repeat a value a number of times, there's no reason to use Seq.unfold. You can use Seq.replicate instead:

// 'a * int -> seq<'a>
let expandRows (text, number) = Seq.replicate number text

You can use Seq.collect to compose it:

File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect expandRows
|> Seq.iter outFile.WriteLine

In fact, the only work performed by this version of expandRows is to 'unpack' a tuple and compose its values into curried form.

While F# doesn't come with such a generic function in its core library, you can easily define it (and other similarly useful functions):

module Tuple2 =
    let curry f x y = f (x, y)    
    let uncurry f (x, y) = f x y    
    let swap (x, y) = (y, x)

This would enable you to compose your pipeline from well-known functional building blocks:

File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect (Tuple2.swap >> Tuple2.uncurry Seq.replicate)
|> Seq.iter outFile.WriteLine
like image 39
Mark Seemann Avatar answered Jan 03 '23 21:01

Mark Seemann