Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In F#, are there any operations like 'melt' or 'cast' in R?

Tags:

r

reshape

f#

Melt and Cast are popular operations to handle data in R. In F# that would be sequences of records of same type or something close to it.

Are you aware of any such functions in F#?

(If not, who would be interested in making some strongly typed version of them...)

More information:

Melt takes a table as input. It has column title (our record fields), and a series of rows. Those column can be grouped into a set of 'identifier' and a set of 'variables'

Melt puts this table in a new canonical form with the columns being now : the identifiers, the column named @"variable", the column named @"value"

If you had 10 'variables' originally, like size, weight, etc.. you will have for each previous record, 10 records in the canonical form, with the values in the column @'variable' being filled with the title of the previous columns from your 'variables'

Cast, conversely, reconstruct a table from a melted one.

A short example in R, melt takes data (dat) that looks like this:

  a          b         c
1 1 0.48411551 0.2372291
2 2 0.58850308 0.3968759
3 3 0.74412592 0.9718320
4 4 0.93060118 0.8665092
5 5 0.01556804 0.2512399

and makes it look like this:

> melt(dat,id.vars = "a")
   a variable      value
1  1        b 0.48411551
2  2        b 0.58850308
3  3        b 0.74412592
4  4        b 0.93060118
5  5        b 0.01556804
6  1        c 0.23722911
7  2        c 0.39687586
8  3        c 0.97183200
9  4        c 0.86650918
10 5        c 0.25123992

cast essentially does the reverse.

Those 2 operations are extremely powerful on a day to day basis. Once you have them it changes your thinking, very much like FP does.

like image 490
nicolas Avatar asked Nov 05 '22 04:11

nicolas


1 Answers

Assuming melt is similar to SQL Server's unpivot, this ought to do the trick:

let melt keys (table: DataTable) = 
  let out = new DataTable()
  let keyCols, otherCols = 
    table.Columns
    |> Seq.cast<DataColumn>
    |> Seq.toArray
    |> Array.partition (fun c -> keys |> Seq.exists (fun k -> k = c.ColumnName))
  for c in keyCols do
    out.Columns.Add(c.ColumnName) |> ignore
  out.Columns.Add("Key", typeof<string>) |> ignore
  out.Columns.Add("Value") |> ignore
  for r in table.Rows do
    for c in otherCols do
      let values = [|
        for c in keyCols do yield r.[c]
        yield box c.ColumnName
        yield r.[c]
      |]
      out.Rows.Add(values) |> ignore
  out

Here's a little test to try it out:

let table = new DataTable()
[|"Country", typeof<string>
  "2001", typeof<int>
  "2002", typeof<int>
  "2003", typeof<int>|]
|> Array.map (fun (name, typ) -> new DataColumn(name, typ))
|> table.Columns.AddRange

[
  "Nigeria", 1, 2, 3
  "UK", 2, 3, 4
]
|> List.iter (fun (a, b, c, d) -> table.Rows.Add(a, b, c, d) |> ignore)

let table2 = table |> melt ["Country"]

table2.Rows
|> Seq.cast<DataRow>
|> Seq.iter (fun r ->
  for (c: DataColumn) in table2.Columns do
    printfn "%A: %A" c.ColumnName r.[c]
  printfn "")

which yields

"Country": "Nigeria"
"Key": "2001"
"Value": "1"

"Country": "Nigeria"
"Key": "2002"
"Value": "2"

...

Assuming cast goes the other way (i.e. pivot), you should be able to take this code and come up with a translation.

If you're doing this a lot, you might find it easier to load your data into SQL Server and use the built-in operators.

like image 72
Daniel Avatar answered Nov 07 '22 21:11

Daniel