
While using CSVprovider on any csv file, I used the below:
http://spatialkeydocs.s3.amazonaws.com/FL_insurance_sample.csv.zip
    type statsProvider = CsvProvider<"../../FL_insurance_sample.csv",",">
    let stats = statsProvider.Load("../../FL_insurance_sample.csv")
    let firstRow = stats.Rows |> Seq.head
The CSVProvider only returns the data from the first column. It does identify the columns (18) correctly and the name of the columns correctly, yet when you look at the type of rows, they're only type string, not a tuple or a structure...
see in the screenshot the type of firstRow should be a specific type e.g. tuple or structure not a string.
What am I doing wrong? Using Visual studio 2017, FSharp 4.1, .net 4.5.2 and FSharp.Data 2.3.3
Note: this happens with several at least 3 more csv files. I picked this particular csv for demonstration only.
I can't reproduce your problem: the sample CSV file you provided worked just fine for me. But then, I'm using VS Code, not Visual Studio; it's possible that the source of the problem is somewhere in Visual Studio 2017 rather than in FSharp.Data. Here's what I did:
.paket/paket.bootstrapper.exe.paket init.paket.dependencies file to add FSharp.Data.paket install.paket generate-load-scripts, which created a bunch of scripts in the .paket/load folder to load all the dependencies at once. (I love this feature for scripting!)Created script.fsx with the following content:
#load ".paket/load/net452/FSharp.Data.fsx"
open FSharp.Data
type Csv = CsvProvider<"/home/rmunn/Downloads/tmp/csv/FL_insurance_sample.csv">
let data = Csv.GetSample()
printfn "%A" data.Headers
let firstRow = data.Rows |> Seq.head
printfn "%A" firstRow
In VS Code, selected the entire script file and pressed Alt+Enter to send it to the F# Interactive window.
Here's the output I got:
F# Interactive for F# 4.1
Freely distributed under the Apache 2.0 Open Source License
For help type #help;;
> # silentCd @"/home/rmunn/code/fsharp/tmp/foo";;
- # 1 @"/home/rmunn/code/fsharp/tmp/foo/script.fsx"
- ;;
(snip the copy of my script that F# Interactive echoed)
[Loading /home/rmunn/code/fsharp/tmp/foo/.paket/load/net452/Zlib.Portable.fsx
 Loading /home/rmunn/code/fsharp/tmp/foo/.paket/load/net452/FSharp.Data.fsx]
namespace FSI_0002.Zlib
namespace FSI_0002.FSharp
Some
  [|"policyID"; "statecode"; "county"; "eq_site_limit"; "hu_site_limit";
    "fl_site_limit"; "fr_site_limit"; "tiv_2011"; "tiv_2012";
    "eq_site_deductible"; "hu_site_deductible"; "fl_site_deductible";
    "fr_site_deductible"; "point_latitude"; "point_longitude"; "line";
    "construction"; "point_granularity"|]
(119736, "FL", "CLAY COUNTY", 498960M, 498960M, 498960M, 498960M, 498960M,
 792148.9M, 0M, 9979.2M, 0, 0, 30.102261M, -81.711777M, "Residential", "Masonry",
 1)
type Csv = FSharp.Data.CsvProvider<...>
val data : FSharp.Data.CsvProvider<...>
val firstRow : FSharp.Data.CsvProvider<...>.Row =
  (119736, "FL", "CLAY COUNTY", 498960M, 498960M, 498960M, 498960M, 498960M,
   792148.9M, 0M, 9979.2M, 0, 0, 30.102261M, -81.711777M, "Residential",
   "Masonry", 1)
val it : unit = ()
However, all did not go entirely smoothly. When I then tried to process every row, I got the following exception:
System.Exception: Couldn't parse row 2439 according to schema: Expecting Int32 in fl_site_deductible, got 68817.6
(I've omitted the traceback because it won't be particularly helpful to you to know which line number in FSharp.Data threw that exception).
The cause of this problem can be seen in the CsvProvider documentation, in the "Controlling the column types" section, which reads:
By default, the CSV type provider checks the first 1000 rows to infer the types, but you can customize it by specifying the InferRows static parameter of CsvProvider. If you specify 0 the entire file will be used.
There are two ways you can solve the "Inferred int but should have been decimal" problem. One would be to add InferRows=0 to your CsvProvider type definition. The other way would be to specify an explicit schema to tell the CsvProvider which rows it's going to get wrong by looking only at the first 1,000. (If your data set is huge, this is far preferable since looking through all the rows to infer data types would take far too long). See the documentation for examples, but you'd do something like Schema="fl_site_deductible=decimal".
So if you can't get your code to work in Visual Studio, see if VS Code (with the Ionide-Paket, Ionide-FSharp, and Ionide-FAKE extensions) works for you instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With