Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple lists of data together by common ID in F#

Tags:

f#

f#-data

I have multiple lists of data from 4 different sources with a common set of IDs that I would like to merge together, based on ID, basically ending up with a new list, one for each ID and a single entry for each source.

The objects in the output list from each of the 4 sources look something like this:

type data = {ID : int; value : decimal;}

so, for example I would have:

let sourceA = [data1, data2, data3];
let sourceB = [data1, data2, data3];
let sourceC = [data1, data2, data3];
let sourceD = [data1, data2, data3];

(I realize this code is not valid, just trying to give a basic idea... the lists are actually pulled and generated from a database)

I would then like to take sourceA, sourceB, sourceC and sourceD and process them into a list containing objects something like this:

type dataByID = {ID : int; valueA : decimal; valueB : decimal; valueC : decimal; valueD : decimal; }

...so that I can then print them out in a CSV, with the first column being the ID and coulmns 2 - 5 being data from sources A - D corresponding to the ID in that row.

I'm totally new to F#, so what would be the best way to process this data so that I match up all the source data values by ID??

like image 749
Adam Haile Avatar asked Jan 24 '11 21:01

Adam Haile


1 Answers

It seems that you could simply concatenate all the lists and then use Seq.groupBy to get a list that contains unique IDs in the input lists and all values associated with the ID. This can be done using something like:

let data = 
  [ data1; data2; data3; data4 ]   // Create list of lists of items 
  |> Seq.concat                    // Concatenate to get a single list of items
  |> Seq.groupBy (fun d -> d.ID)   // Group elements by ID

seq { for id, values in data -> 
        // ID is the id and values is a sequence with all values 
        // (that come from any data source) }

If you want to associate the source (whether it was data1, data2, etc...) with the value then you can first usemap` operation to add an index of the data source:

let addIndex i data = 
  data |> Seq.map (fun v -> i, v)

let data = 
  [ List.map (addIndex 1) data1;
    List.map (addIndex 2) data2;
    List.map (addIndex 3) data3;
    List.map (addIndex 4) data4 ]
  |> Seq.concat
  |> Seq.groupBy (fun (index, d) -> d.ID)

Now, data also contains index of the data source (from 1 to 3), so when iterating over the values, you can use index to find out from which data source the item comes from. Even nicer version can be written using Seq.mapi to iterate over list of data sources and add index to all the values automatically:

let data = 
  [ data1; data2; data3; data4 ]
  |> Seq.mapi (fun index data -> Seq.map (addIndex index) data)
  |> Seq.concat
  |> Seq.groupBy (fun (index, d) -> d.ID)
like image 166
Tomas Petricek Avatar answered Nov 09 '22 13:11

Tomas Petricek