Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# Writing to file changes behavior on return type

Tags:

f#

cntk

I have the following function that convert csv files to a specific txt schema (expected by CNTKTextFormat Reader):

open System.IO
open FSharp.Data;
open Deedle;

let convert (inFileName : string) = 
    let data = Frame.ReadCsv(inFileName)
    let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
    use outFile = new StreamWriter(outFileName, false)
    data.Rows.Observations
    |> Seq.map(fun kvp ->
        let row = kvp.Value |> Series.observations |> Seq.map(fun (k,v) -> v) |> Seq.toList
        match row with
        | label::data ->
            let body = data |> List.map string |> String.concat " "
            outFile.WriteLine(sprintf "|labels %A |features %s" label body)
            printf "%A" label
        | _ ->
            failwith "Bad data."
    )
    |> ignore

Strangely, the output file is empty after running in the F# interactive panel and that printf yields no printing at all.

If I remove the ignore to make sure that there are actual rows being processed (evidenced by returning a seq of nulls), instead of an empty file I get:

val it : seq<unit> = Error: Cannot write to a closed TextWriter.

Before, I was declaring the StreamWriter using let and disposing it manually, but I also generated empty files or just a few lines (say 5 out of thousands).

What is happening here? Also, how to fix the file writing?

like image 357
villasv Avatar asked Dec 15 '22 01:12

villasv


2 Answers

Seq.map returns a lazy sequence which is not evaluated until it is iterated over. You are not currently iterating over it within convert so no rows are processed. If you return a Seq<unit> and iterate over it outside convert, outFile will already be closed which is why you see the exception.

You should use Seq.iter instead:

data.Rows.Observations
    |> Seq.iter (fun kvp -> ...)
like image 195
Lee Avatar answered Dec 25 '22 16:12

Lee


Apart from the solutions already mentioned, you could also avoid the StreamWriter altogether, and use one of the standard .Net functions, File.WriteAllLines. You would prepare a sequence of converted lines, and then write that to the file:

let convert (inFileName : string) = 
    let lines = 
        Frame.ReadCsv(inFileName).Rows.Observations
        |> Seq.map(fun kvp ->
            let row = kvp.Value |> Series.observations |> Seq.map snd |> Seq.toList
            match row with
            | label::data ->
                let body = data |> List.map string |> String.concat " "
                printf "%A" label
                sprintf "|labels %A |features %s" label body
            | _ ->
                failwith "Bad data."
        )
    let outFileName = inFileName.Substring(0, (inFileName.Length - 4)) + ".txt"
    File.WriteAllLines(outFileName, lines)

Update based on the discussion in the comments: Here's a solution that avoids Deedle altogether. I'm making some assumptions about your input file format here, based on another question you posted today: Label is in column 1, features follow.

let lines = 
    File.ReadLines inFileName
    |> Seq.map (fun line -> 
        match Seq.toList(line.Split ',') with
        | label::data ->
            let body = data |> List.map string |> String.concat " "
            printf "%A" label
            sprintf "|labels %A |features %s" label body
        | _ ->
            failwith "Bad data."
    )
like image 34
Anton Schwaighofer Avatar answered Dec 25 '22 16:12

Anton Schwaighofer