I am learning F# and the FSharp.Data library. I have a task which I need to read 20 CSV files. Each file has different number of columns but the records share the same nature: keyed on a date string and all the rest of the columns are float numbers. I need to do some statistical calculation on the float format data columns before persist the results into the database. Although I got all the plumbing logic working:
The solution is far from acceptable. I thought I could create a generic top level function as the driver to loop through all the files. However after days of attempts I am getting nowhere.
The FSharp.Data CSV type provider has the following pattern:
type Stocks = CsvProvider<"../docs/MSFT.csv">
let msft = Stocks.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT")
msft.Data |> Seq.map(fun row -> do something with row)
...
I have tried:
let mainfunc (typefile:string) (datafile:string) =
let msft = CsvProvider<typefile>.Load(datafile)
....
This doesnt work as the CsvProvider complains the typefile is not a valid constant expression. I am guessing the type provider must need the file to deduce the type of the columns at the coding time, the type inference can not be deferred until the code where the mainfunc is called with the relevant information.
I then tried to pass the Type into the mainfunc as a parameter
neither
let mainfunc (typeProvider:CsvProvider<"../docs/MSFT.csv">) =
....
nor
let mainfunc<typeProvider:CsvProvider<"../docs/MSFT.csv">> =
....
worked.
I then tried to pass the MSFT from
type Stocks = CsvProvider<"../docs/MSFT.csv">
let msft = Stocks.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT")
Into a mainFunc. According to the intellisence, MSFT has a type of CsvProvider<...>
and MSFT.Data has a type of seq<CsvProvider<...>>
. I have tried to declare a input parameter with explicit type of these two but neither of them can pass compile.
Can anyone please help and point me to the right direction? Am I missing somthing fundamental here? Any .net type and class object can be used in a F# function to explicitly specify the parameter type, but can i do the same with the type from a type provider?
If the answer to above question is no, what are the alternative to make the logic generic to handle 20 files or even 200 different files?
This is related to Type annotation for using a F# TypeProvider type e.g. FSharp.Data.JsonProvider<...>.DomainTypes.Url
Even though intellisense shows you CsvProvider<...>
, to reference the msft
type in a type annotation you have to use Stocks
, and for msft.Data
, instead of CsvProvider<...>.Row
, you have to use Stocks.Row
.
If you want to do something dynamic, you can get the columns names with msft.Headers
and you can get the types of the columns using Microsoft.FSharp.Reflection.FSharpType.GetTupleElements(typeof<Stocks.Row>)
(this works because the row is erased to a tuple at runtime)
EDIT:
If the formats are incompatible, and you're dealing with dynamic data that doesn't conform to a common format, you might want to use CsvFile
instead (http://fsharp.github.io/FSharp.Data/library/CsvFile.html), but you'll lose all the type safety of the type provider. You might also consider using Deedle instead (http://bluemountaincapital.github.io/Deedle/)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With