For example, here is simple Haskell enum data type:
data Bool = False | True
There's -XUnboxedSums
extension since GHC-8.2.1 that allows to define sum types in more memory-efficient way. Here is a quote from documentation:
In the degenerate case where all the alternatives have zero width, such as the
Bool
-like(# (# #) | (# #) #)
, the unboxed sum layout only has an Int32 tag field (i.e., the whole thing is represented by an integer).
I wonder, whether this representation is more efficient than using simple plain Haskell enum data type?
And here is the full documentation:
The two types cannot be used interchangeable.
The type
data Bool = False | True
is a lifted type. This means that a variable x :: Bool
can still be unevaluated (e.g. a thunk). It will therefore be represented as a pointer.
In contrast
(# (# #) | (# #) #)
is unlifted, and really can only have the two values, and will be represented as just a machine integer.
This does, however not mean that the latter is more space efficient: Because True
and False
are nullary constructors (they take no arguments), they exist once and for all in the static code of your program, and the pointers just point to them. So the cost is one machine word in all cases.
The unboxed variant may be slightly more efficient than Bool
:
For Bool
, the code first has to check is this already evaluated?, and then it can branch on which constructor it is.
The branching is pretty efficient: The code does not have to follow the pointer for that: The “pointer tag” in the low few bits of the pointer address will indicate which constructor it is. There is a slight cost for masking the other bits, though.
In the unboxed variant, well, it's just 0
or 1
, and no thunk check is needed.
jberryman says: as of Jan 2021 the following don't appear to be implemented yet, but this is how things might one day work.
If you define a data type like this
data Foo = Foo {-# UNPACK #-} !Bool
you should get exactly the same result as if you’d write
data Foo = Foo (# (# #) | (# #) #)
as that was the main purpose of the unboxed sum extension.
If you define a function that is strict in a Bool
argument, e.g.
foo True = "Hello"
foo False = "World!"
then I could imagine (but did not check) that the compiler optimizes that into a worker that takes a (# (# #) | (# #) #)
and a wrapper that takes care of the thunk-check. The wrapper might then inline in use-sites where foo
is used, and everything might end up in terms of (# (# #) | (# #) #)
.
Don’t bother.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With