I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):
open System.IO
open FSharp.Data;
open Deedle;
let convert (inFileName : string) =
let data = Frame.ReadCsv(inFileName)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
use outFile = new StreamWriter(outFileName, false)
data.Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
outFile.WriteLine(sprintf "|labels %A |features %s" label body)
printf "%A" label
| _ ->
failwith "Bad data."
)
|> ignore
Strangely, the output file is empty after running in the F# interactive panel and that printf
yields no printing at all.
If I remove the ignore
to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:
val it : seq<unit> = Error: Cannot write to a closed TextWriter.
Before, I was declaring the StreamWriter
using let
and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).
What is happening here? Also, how to fix the file writing?
Seq.map
returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert
so no rows are processed. If you return a Seq<unit>
and iterate over it outside convert
, outFile
will already be closed which is why you see the exception.
You should use Seq.iter
instead:
data.Rows.Observations
|> Seq.iter (fun kvp -> ...)
Apart from the solutions already mentioned, you could also avoid the StreamWriter
altogether, and use one of the standard .Net functions, File.WriteAllLines
. You would prepare a sequence of converted lines, and then write that to the file:
let convert (inFileName : string) =
let lines =
Frame.ReadCsv(inFileName).Rows.Observations
|> Seq.map(fun kvp ->
let row = kvp.Value |> Series.observations |> Seq.map snd |> Seq.toList
match row with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
File.WriteAllLines(outFileName, lines)
Update based on the discussion in the comments: Here's a solution that avoids Deedle altogether. I'm making some assumptions about your input file format here, based on another question you posted today: Label is in column 1, features follow.
let lines =
File.ReadLines inFileName
|> Seq.map (fun line ->
match Seq.toList(line.Split ',') with
| label::data ->
let body = data |> List.map string |> String.concat " "
printf "%A" label
sprintf "|labels %A |features %s" label body
| _ ->
failwith "Bad data."
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With