I just realized F# records are reference types and how much boxing and unboxing I have going on. I have a lot of tiny records like this:
type InputParam =
| RegionString of string
| RegionFloat of float32
But if I try to tag it with the "Struct" attribute, I get a compiler error stating "FS3204 If a union type has more than one case and is a struct, then all fields within the union type must be given unique names." The language reference shows to create struct discriminated unions like this:
[<Struct>]
type InputParamStruct =
| RegionString of RegionString: string
| RegionFloat of RegionFloat: float32
What is the difference between x of string and x of x: string? How are the fields not unique to begin with? Why doesn't F# default to structs for records?
A discriminated union is a union data structure that holds various objects, with one of the objects identified directly by a discriminant. The discriminant is the first item to be serialized or deserialized. A discriminated union includes both a discriminant and a component.
Discriminated unions are useful for heterogeneous data; data that can have special cases, including valid and error cases; data that varies in type from one instance to another; and as an alternative for small object hierarchies.
Firstly, these aren't Records - they are Discriminated Unions. A Record is a simple aggregate of named data with generated equality/hashing, and making it a struct is also possible but does not come with additional requirements.
The stricter requirements for struct Discriminated Unions are:
The first two points are inherent to being value types. Value and reference types are just different.
The last point is interesting. Consider the following:
type DU1 =
| Case1 of string
| Case2 of float
[<Struct>]
type DU2 =
| Case1 of sval: string
| Case2 of fval: float
In the case of DU1
, there is an inner class for each case, and those contain properties for accessing the underlying data. These properties are named Item1
, Item2
, and so on and since they are encapsulated in an inner class they're unique when accessed.
In the case of DU2
, the sval
and fval
values are laid out flat; there is no inner class that contains them. This is because a goal is performance/size of the struct. The naming strategy for data in a union case (Item1
/Item2
/etc.) doesn't apply because all of the data is laid out flat. And so the design decision was to require unique named cases rather than apply some trickery to kludge together the name of the case itself and some variation of Item1
/Item2
/etc. The uniqueness issue is inherent to the design of unions themselves in the compiler and not just a codegen design choice.
Lastly, this question has another interesting answer:
Why doesn't F# default to structs for records?
Tuples, Records, and DUs in F# can all be marked as [<Struct>]
but are not structs by default. This is because structs are not simply a "make it more efficient" button you can push. Often times you will get worse CPU performance due to excessive copying because your structs are too large. In F#, it is quite normal to have large tuples and very very large records and discriminated unions. Making these structs by default would not be a good choice. Reference types are very powerful and designed to work very well on .NET and shouldn't be avoided by default just because in some cases a struct could result in slightly faster performance.
Whenever you're concerned about performance, never change things just based on assumptions or intuition: use profiling tools like PerfView, dotTrace, or dotMemory; and benchmark small changes with statistical tools like BenchmarkDotNet. Performance is an extremely complicated space and rarely is something simple once you're done accounting for egregious problems that are obviously bad (like O(n^2) algorithms on large data sets or something).
Without question, this should be a struct. It's immutable and 16 bytes. Looking at the disassembly, this reference type:
type InputParam =
| RegionString of string
| RegionFloat of float32
And this reference type:
type InputParam =
| RegionString of RegionString: string
| RegionFloat of RegionFloat: float32
Are functionally identical. The only difference is with how the compiler named things. They both create a subclass called "RegionString" but with different property names -- "RegionString.item" vs "RegionString.RegionString".
When you convert the first example into a struct, it does away with the subclasses and tries to stick 2 "item" properties on the record which causes the FS3204 unique name error.
As far as performance, you should use structs on every tiny type like these when composing. Consider this example script:
type Name = Name of string
let ReverseName (Name s) =
s.ToCharArray() |> Array.rev |> System.String |> Name
[<Struct>]
type StrName = StrName of string
let StrReverseName (StrName s) =
s.ToCharArray() |> Array.rev |> System.String |> StrName
#time
Array.init 10000000 (fun x -> Name (x.ToString()))
|> Array.map ReverseName
|> ignore
#time
#time
Array.init 10000000 (fun x -> StrName (x.ToString()))
|> Array.map StrReverseName
|> ignore
#time
sizeof<Name>
sizeof<StrName>
The first one wraps a ref type in a ref type which doubled the performance hit:
Real: 00:00:04.637, CPU: 00:00:04.703, GC gen0: 340, gen1: 104, gen2: 7
...
Real: 00:00:02.620, CPU: 00:00:02.625, GC gen0: 257, gen1: 73, gen2: 1
...
val it : int = 8
val it : int = 8
Functional domain modeling is awesome, but you have to keep in mind that these have the same performance overhead:
let c = CustomerID 5
let i = 5 :> obj
The recommendation is anything immutable under 16 bytes should be a struct. If it was over 16 bytes, you had to look at the behavior. If it's being passed around a lot, you might be better off passing the 64-bit ref pointer and taking the ref overhead hit. But for internal data when composing types or within a function, stick with structs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With