Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing entries in a csv files for computation F#

Tags:

arrays

math

csv

f#

How can I access the entries in a csv file in order to perform calculations on them in F#?

I can read the csv file into memory in the usual way, but once there I am stuck.

Ideally I would just create arrays from the columns and then use array.map2 to perform calculations.

So I array 1 is some website usage metric, and column 2 is the number of users that reached the value in column 1 (say made 6 visits to a website) we could calculate the mean number of visits by multiplying each entry in an array of column 1, by an array made from column 2 and dividing by the array.sum of column 2.

I have tried the csv to Array code on F# snippets, http://fssnip.net/3T, but it produces and array for me that is a series of string tuples.

Can anyone suggest a better approach?

EDIT: Some sample input would be similar to this:-

     Visits Count
     1  8
     2  9
     3  5
     4  3
     5  2
     6  1
     7  1
    10  1

And the output would be to return the mean of the data, in this case 2.87 (to 2 decimal places).

EDIT 2: The current output from the CSV to array code I found is this

     val it : seq<BookWindow> =
            seq [{Visits = 1;
                  Count = 8;}; {Visits = 2;
                           Count = 9;}; {Visits = 3;
                                  Count = 5;}; {Visits = 4;
                                              Count = 3;}; ...]

which is not so useful for calculations...

like image 256
Simon Hayward Avatar asked Dec 09 '22 20:12

Simon Hayward


1 Answers

What I do is create a record type so I can use strongly typed operations lateron, and then read the textfile into a seq<myRecord> very quickly like this code below. If i intend to reuse this lateron I usually move the method to the record as static member fromFile. The seq is very useful if you work with large textfiles as I do usually, it uses very little memory this way.

edit this is cleaner:

open System.IO

type myRecord = { 
    Visits: int
    Count: int 
} with
    static member fromFile file = 
        file
        |> File.ReadLines       // expose as seq<string>
        |> Seq.skip 1           // skip headers
        |> Seq.map (fun s-> s.Split '\t') // split each line into array
        |> Seq.map (fun a -> {Visits=int a.[0]; Count=int a.[1]}) // and create record

myRecord.fromFile @"D:\data.csv"
|> Seq.fold (fun (tv, tc) r -> (tv+r.Visits*r.Count, tc+r.Count))(0,0)
|> (fun t -> float (fst t) / float (snd t))
//val mean : float = 2.866666667
like image 128
gjvdkamp Avatar answered Dec 30 '22 05:12

gjvdkamp