Melt and Cast are popular operations to handle data in R. In F# that would be sequences of records of same type or something close to it.
Are you aware of any such functions in F#?
(If not, who would be interested in making some strongly typed version of them...)
More information:
Melt takes a table as input. It has column title (our record fields), and a series of rows. Those column can be grouped into a set of 'identifier' and a set of 'variables'
Melt puts this table in a new canonical form with the columns being now : the identifiers, the column named @"variable", the column named @"value"
If you had 10 'variables' originally, like size, weight, etc.. you will have for each previous record, 10 records in the canonical form, with the values in the column @'variable' being filled with the title of the previous columns from your 'variables'
Cast, conversely, reconstruct a table from a melted one.
A short example in R, melt
takes data (dat
) that looks like this:
a b c
1 1 0.48411551 0.2372291
2 2 0.58850308 0.3968759
3 3 0.74412592 0.9718320
4 4 0.93060118 0.8665092
5 5 0.01556804 0.2512399
and makes it look like this:
> melt(dat,id.vars = "a")
a variable value
1 1 b 0.48411551
2 2 b 0.58850308
3 3 b 0.74412592
4 4 b 0.93060118
5 5 b 0.01556804
6 1 c 0.23722911
7 2 c 0.39687586
8 3 c 0.97183200
9 4 c 0.86650918
10 5 c 0.25123992
cast
essentially does the reverse.
Those 2 operations are extremely powerful on a day to day basis. Once you have them it changes your thinking, very much like FP does.
Assuming melt
is similar to SQL Server's unpivot
, this ought to do the trick:
let melt keys (table: DataTable) =
let out = new DataTable()
let keyCols, otherCols =
table.Columns
|> Seq.cast<DataColumn>
|> Seq.toArray
|> Array.partition (fun c -> keys |> Seq.exists (fun k -> k = c.ColumnName))
for c in keyCols do
out.Columns.Add(c.ColumnName) |> ignore
out.Columns.Add("Key", typeof<string>) |> ignore
out.Columns.Add("Value") |> ignore
for r in table.Rows do
for c in otherCols do
let values = [|
for c in keyCols do yield r.[c]
yield box c.ColumnName
yield r.[c]
|]
out.Rows.Add(values) |> ignore
out
Here's a little test to try it out:
let table = new DataTable()
[|"Country", typeof<string>
"2001", typeof<int>
"2002", typeof<int>
"2003", typeof<int>|]
|> Array.map (fun (name, typ) -> new DataColumn(name, typ))
|> table.Columns.AddRange
[
"Nigeria", 1, 2, 3
"UK", 2, 3, 4
]
|> List.iter (fun (a, b, c, d) -> table.Rows.Add(a, b, c, d) |> ignore)
let table2 = table |> melt ["Country"]
table2.Rows
|> Seq.cast<DataRow>
|> Seq.iter (fun r ->
for (c: DataColumn) in table2.Columns do
printfn "%A: %A" c.ColumnName r.[c]
printfn "")
which yields
"Country": "Nigeria"
"Key": "2001"
"Value": "1"
"Country": "Nigeria"
"Key": "2002"
"Value": "2"
...
Assuming cast
goes the other way (i.e. pivot
), you should be able to take this code and come up with a translation.
If you're doing this a lot, you might find it easier to load your data into SQL Server and use the built-in operators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With