Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struct Attribute on Discriminated Unions

I just realized F# records are reference types and how much boxing and unboxing I have going on. I have a lot of tiny records like this:

type InputParam =
    | RegionString of string
    | RegionFloat of float32

But if I try to tag it with the "Struct" attribute, I get a compiler error stating "FS3204 If a union type has more than one case and is a struct, then all fields within the union type must be given unique names." The language reference shows to create struct discriminated unions like this:

[<Struct>]
type InputParamStruct =
    | RegionString of RegionString: string
    | RegionFloat of RegionFloat: float32

What is the difference between x of string and x of x: string? How are the fields not unique to begin with? Why doesn't F# default to structs for records?

like image 951
EricP Avatar asked Jan 14 '20 16:01

EricP


People also ask

How is a discriminated union defined?

A discriminated union is a union data structure that holds various objects, with one of the objects identified directly by a discriminant. The discriminant is the first item to be serialized or deserialized. A discriminated union includes both a discriminant and a component.

When to use Discriminated union?

Discriminated unions are useful for heterogeneous data; data that can have special cases, including valid and error cases; data that varies in type from one instance to another; and as an alternative for small object hierarchies.


2 Answers

Firstly, these aren't Records - they are Discriminated Unions. A Record is a simple aggregate of named data with generated equality/hashing, and making it a struct is also possible but does not come with additional requirements.

The stricter requirements for struct Discriminated Unions are:

  • No callable default constructor
  • No cyclic references / no recursive definitions
  • Multi-case must have unique names

The first two points are inherent to being value types. Value and reference types are just different.

The last point is interesting. Consider the following:

type DU1 =
    | Case1 of string
    | Case2 of float

[<Struct>]
type DU2 =
    | Case1 of sval: string
    | Case2 of fval: float

In the case of DU1, there is an inner class for each case, and those contain properties for accessing the underlying data. These properties are named Item1, Item2, and so on and since they are encapsulated in an inner class they're unique when accessed.

In the case of DU2, the sval and fval values are laid out flat; there is no inner class that contains them. This is because a goal is performance/size of the struct. The naming strategy for data in a union case (Item1/Item2/etc.) doesn't apply because all of the data is laid out flat. And so the design decision was to require unique named cases rather than apply some trickery to kludge together the name of the case itself and some variation of Item1/Item2/etc. The uniqueness issue is inherent to the design of unions themselves in the compiler and not just a codegen design choice.

Lastly, this question has another interesting answer:

Why doesn't F# default to structs for records?

Tuples, Records, and DUs in F# can all be marked as [<Struct>] but are not structs by default. This is because structs are not simply a "make it more efficient" button you can push. Often times you will get worse CPU performance due to excessive copying because your structs are too large. In F#, it is quite normal to have large tuples and very very large records and discriminated unions. Making these structs by default would not be a good choice. Reference types are very powerful and designed to work very well on .NET and shouldn't be avoided by default just because in some cases a struct could result in slightly faster performance.

Whenever you're concerned about performance, never change things just based on assumptions or intuition: use profiling tools like PerfView, dotTrace, or dotMemory; and benchmark small changes with statistical tools like BenchmarkDotNet. Performance is an extremely complicated space and rarely is something simple once you're done accounting for egregious problems that are obviously bad (like O(n^2) algorithms on large data sets or something).

like image 54
Phillip Carter Avatar answered Nov 10 '22 01:11

Phillip Carter


Without question, this should be a struct. It's immutable and 16 bytes. Looking at the disassembly, this reference type:

type InputParam =
    | RegionString of string
    | RegionFloat of float32

And this reference type:

type InputParam =
    | RegionString of RegionString: string
    | RegionFloat of RegionFloat: float32

Are functionally identical. The only difference is with how the compiler named things. They both create a subclass called "RegionString" but with different property names -- "RegionString.item" vs "RegionString.RegionString".

When you convert the first example into a struct, it does away with the subclasses and tries to stick 2 "item" properties on the record which causes the FS3204 unique name error.

As far as performance, you should use structs on every tiny type like these when composing. Consider this example script:

type Name = Name of string
let ReverseName (Name s) =
    s.ToCharArray() |> Array.rev |> System.String |> Name

[<Struct>]
type StrName = StrName of string
let StrReverseName (StrName s) =
    s.ToCharArray() |> Array.rev |> System.String |> StrName

#time
Array.init 10000000 (fun x -> Name (x.ToString()))
|> Array.map ReverseName
|> ignore
#time

#time
Array.init 10000000 (fun x -> StrName (x.ToString()))
|> Array.map StrReverseName
|> ignore
#time

sizeof<Name>
sizeof<StrName>

The first one wraps a ref type in a ref type which doubled the performance hit:

Real: 00:00:04.637, CPU: 00:00:04.703, GC gen0: 340, gen1: 104, gen2: 7
...
Real: 00:00:02.620, CPU: 00:00:02.625, GC gen0: 257, gen1: 73, gen2: 1
...
val it : int = 8
val it : int = 8

Functional domain modeling is awesome, but you have to keep in mind that these have the same performance overhead:

let c = CustomerID 5
let i = 5 :> obj

The recommendation is anything immutable under 16 bytes should be a struct. If it was over 16 bytes, you had to look at the behavior. If it's being passed around a lot, you might be better off passing the 64-bit ref pointer and taking the ref overhead hit. But for internal data when composing types or within a function, stick with structs.

like image 36
EricP Avatar answered Nov 10 '22 00:11

EricP