Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File transform in F#

Tags:

f#

I am just starting to work with F# and trying to understand typical idoms and effective ways of thinking and working.

The task at hand is a simple transform of a tab-delimited file to one which is comma-delimited. A typical input line will look like:

let line = "@ES#    01/31/2006 13:31:00 1303.00 1303.00 1302.00 1302.00 2514    0"

I started out with looping code like this:

// inFile and outFile defined in preceding code not shown here

for line in File.ReadLines(inFile) do
    let typicalArray = line.Split '\t'
    let transformedLine = typicalArray |> String.concat ","   
    outFile.WriteLine(transformedLine)

I then replaced the split/concat pair of operations with a single Regex.Replace():

for line in File.ReadLines(inFile) do   
    let transformedLine = Regex.Replace(line, "\t",",")
    outFile.WriteLine(transformedLine)

And now, finally, have replaced the looping with a pipeline:

File.ReadLines(inFile)
    |> Seq.map  (fun x -> Regex.Replace(x, "\t", ","))
    |> Seq.iter (fun y -> outFile.WriteLine(y))

  // other housekeeping code below here not shown

While all versions work, the final version seems to me the most intuitive. Is this how a more experienced F# programmer would accomplish this task?

like image 508
akucheck Avatar asked Jun 30 '13 20:06

akucheck


1 Answers

I think all three versions are perfectly fine, idiomatic code that F# experts would write.

I generally prefer writing code using built-in language features (like for loops and if conditions) if they let me solve the problem I have. These are imperative, but I think using them is a good idea when the API requires imperative code (like outFile.WriteLine). As you mentioned - you started with this version (and I would do the same).

Using higher-order functions is nice too - although I would probably do that only if I wanted to write data transformation and get a new sequence or list of lines - this would be handy if you were using File.WriteAllLines instead of writing lines one-by-one. Although, that could be also done by simply wrapping your second version with sequence expression:

let transformed = 
    seq { for line in File.ReadLines(inFile) -> Regex.Replace(line, "\t",",") }
File.WriteAllLines(outFilePath, transformed) 

I do not think there is any objective reason to prefer one of the versions. My personal stylistic preference is to use for and refactor to sequence expressions (if needed), but others will likely disagree.

like image 54
Tomas Petricek Avatar answered Oct 30 '22 15:10

Tomas Petricek