I am just starting to work with F# and trying to understand typical idoms and effective ways of thinking and working.
The task at hand is a simple transform of a tab-delimited file to one which is comma-delimited. A typical input line will look like:
let line = "@ES# 01/31/2006 13:31:00 1303.00 1303.00 1302.00 1302.00 2514 0"
I started out with looping code like this:
// inFile and outFile defined in preceding code not shown here
for line in File.ReadLines(inFile) do
let typicalArray = line.Split '\t'
let transformedLine = typicalArray |> String.concat ","
outFile.WriteLine(transformedLine)
I then replaced the split/concat pair of operations with a single Regex.Replace():
for line in File.ReadLines(inFile) do
let transformedLine = Regex.Replace(line, "\t",",")
outFile.WriteLine(transformedLine)
And now, finally, have replaced the looping with a pipeline:
File.ReadLines(inFile)
|> Seq.map (fun x -> Regex.Replace(x, "\t", ","))
|> Seq.iter (fun y -> outFile.WriteLine(y))
// other housekeeping code below here not shown
While all versions work, the final version seems to me the most intuitive. Is this how a more experienced F# programmer would accomplish this task?
I think all three versions are perfectly fine, idiomatic code that F# experts would write.
I generally prefer writing code using built-in language features (like for
loops and if
conditions) if they let me solve the problem I have. These are imperative, but I think using them is a good idea when the API requires imperative code (like outFile.WriteLine
). As you mentioned - you started with this version (and I would do the same).
Using higher-order functions is nice too - although I would probably do that only if I wanted to write data transformation and get a new sequence or list of lines - this would be handy if you were using File.WriteAllLines
instead of writing lines one-by-one. Although, that could be also done by simply wrapping your second version with sequence expression:
let transformed =
seq { for line in File.ReadLines(inFile) -> Regex.Replace(line, "\t",",") }
File.WriteAllLines(outFilePath, transformed)
I do not think there is any objective reason to prefer one of the versions. My personal stylistic preference is to use for
and refactor to sequence expressions (if needed), but others will likely disagree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With