Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# collection type for mixed types

This question is coming from someone who is working on making the transition from R to F#. I fully acknowledge my approach here may be wrong so I am looking for the F# way of doing this. I have a situation where I want iterate through a set of XML files, parse them, and extract several values to identify which ones need further processing. My natural inclination is to Map over the array of XML data, exampleData in this case, parse each using the RawDataProvider type provider, and finally create a Map object for each file containing the parsed XML, the Status value from the XML, and the ItemId value.

Turns out that the Map type in F# is not like a List in R. Lists in R are essentially hashmaps which can support having mixed types. It appears that the Map type in F# does not support storing mixed types. I have found this to be incredibly useful in my R work and am looking for what the right F# collection is for this.

Or, am I thinking about this all wrong? This is a very natural way for me to process data in R so I would expect there would be a way to do it in F# as well. The assumption is that I am going to do further analysis and add additional elements of data to these collections.

Update: This seems like such a simple use case that there must be an idiomatic way of doing this in F# without having to define a Record type for each step of the analysis. I have updated my example to further illustrate what I am trying to do. I want to return an Array of the Map objects that I have analyzed:

type RawDataProvider = XmlProvider<"""<product Status="Good" ItemId="123" />""">        

let exampleData = [| """<product Status="Good" ItemId="123" />"""; """<product Status="Bad" ItemId="456" />"""; """<product Status="Good" ItemId="789" />"""|]

let dataResult =
            exampleData
            |> Array.map(fun fileData -> RawDataProvider.Parse(fileData))
            |> Array.map(fun xml -> Map.empty.Add("xml", xml).Add("Status", xml.Status).Add("ItemId", xml.ItemId))
            |> Array.map(fun elem -> elem.["calc1Value"] = calc1 elem["itemId"])
            |> Array.map(fun elem -> elem.["calc2"] = calc2 elem.["ItemId"] elem.["calc1Value"])
like image 490
Matthew Crews Avatar asked Mar 08 '16 18:03

Matthew Crews


Video Answer


1 Answers

This is what I would consider almost idiomatic here - I'm keeping the same shape as in your example so you can match the two:

let dataResult =
    exampleData
    |> Array.map(fun fileData -> RawDataProvider.Parse(fileData))  
    |> Array.map(fun xml -> xml, calc1 xml.ItemId)
    |> Array.map(fun (xml, calcedValue1) -> xml, calcedValue1, calc2 xml.ItemId calcedValue1)

What XmlProvider really gives you is not simply xml parsing, but the fact that it generates a strongly typed representation of the xml. This is better than putting the data in a map, in that it gives you stronger guarantees about whether your program is doing the right thing. For instance it wouldn't let you mix up itemId and ItemId as it happened in your code snippet ;)

For the values you calculate in the following steps, you could use tuples instead of a record. In general, records are preferred to tuples as they lead to more readable code, but combining related values of different types into ad-hoc aggregates is really the scenario where using tuples shines.

Now, I said almost idiomatic - I would break up parsing and processing parsed xmls into separate functions, and calculate both calc1 and calc2 results in a single function instead of composing two Array.maps like this:

let dataResult = 
    parsedData
    |> Array.map(fun xml -> 
        let calced1 = calc1 xml.ItemId
        xml, calced1, calc2 xml.ItemId calced1)

If you're coming from R background, you might want to check out Deedle for an alternative approach. It gives you a workflow similar to R in F#.

like image 198
scrwtp Avatar answered Sep 24 '22 03:09

scrwtp