In GHC.Prim
, we find a magical function named dataToTag#:
dataToTag# :: a -> Int#
It turns a value of any type into an integer based on the data constructor it uses. This is used to speed up derived implementations of Eq
, Ord
, and Enum
. In the GHC source, the docs for dataToTag#
explain that the argument should already by evaluated:
The dataToTag# primop should always be applied to an evaluated argument. The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
getTag :: a -> Int# getTag !x = dataToTag# x
It makes total sense to me that we need to force x
's evaluation before dataToTag#
is called. What I do not get is why the bang pattern is sufficient. The definition of getTag
is just syntactic sugar for:
getTag :: a -> Int#
getTag x = x `seq` dataToTag# x
But let's turn to the docs for seq:
A note on evaluation order: the expression
seq a b
does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.
In the Control.Parallel
module from the parallel
package, the docs elaborate further:
... seq is strict in both its arguments, so the compiler may, for example, rearrange
a `seq` b
intob `seq` a `seq` b
...
How is it that getTag
is guaranteed to behave work, given that seq
is insufficient for controlling evaluation order?
GHC tracks certain information about each primop. One key datum is whether the primop "can_fail". The original meaning of this flag is that a primop can fail if it can cause a hard fault. For example, array indexing can cause a segmentation fault if the index is out of range, so indexing operations can fail.
If a primop can fail, GHC will restrict certain transformations around it, and in particular won't float it out of any case
expressions. It would be rather bad, for example, if
if n < bound
then unsafeIndex c n
else error "out of range"
were compiled to
case unsafeIndex v n of
!x -> if n < bound
then x
else error "out of range"
One of these bottoms is an exception; the other is a segfault.
dataToTag#
is marked can_fail. So GHC sees (in Core) something like
getTag = \x -> case x of
y -> dataToTag# y
(Note that case
is strict in Core.) Because dataToTag#
is marked can_fail, it won't be floated out of any case
expressions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With