Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strictness of dataToTag argument

Tags:

haskell

ghc

In GHC.Prim, we find a magical function named dataToTag#:

dataToTag# :: a -> Int#

It turns a value of any type into an integer based on the data constructor it uses. This is used to speed up derived implementations of Eq, Ord, and Enum. In the GHC source, the docs for dataToTag# explain that the argument should already by evaluated:

The dataToTag# primop should always be applied to an evaluated argument. The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:

getTag :: a -> Int#
getTag !x = dataToTag# x

It makes total sense to me that we need to force x's evaluation before dataToTag# is called. What I do not get is why the bang pattern is sufficient. The definition of getTag is just syntactic sugar for:

getTag :: a -> Int#
getTag x = x `seq` dataToTag# x

But let's turn to the docs for seq:

A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.

In the Control.Parallel module from the parallel package, the docs elaborate further:

... seq is strict in both its arguments, so the compiler may, for example, rearrange a `seq` b into b `seq` a `seq` b ...

How is it that getTag is guaranteed to behave work, given that seq is insufficient for controlling evaluation order?

like image 753
Andrew Thaddeus Martin Avatar asked Jul 11 '17 19:07

Andrew Thaddeus Martin


1 Answers

GHC tracks certain information about each primop. One key datum is whether the primop "can_fail". The original meaning of this flag is that a primop can fail if it can cause a hard fault. For example, array indexing can cause a segmentation fault if the index is out of range, so indexing operations can fail.

If a primop can fail, GHC will restrict certain transformations around it, and in particular won't float it out of any case expressions. It would be rather bad, for example, if

if n < bound
then unsafeIndex c n
else error "out of range"

were compiled to

case unsafeIndex v n of
  !x -> if n < bound
        then x
        else error "out of range"

One of these bottoms is an exception; the other is a segfault.

dataToTag# is marked can_fail. So GHC sees (in Core) something like

getTag = \x -> case x of
           y -> dataToTag# y

(Note that case is strict in Core.) Because dataToTag# is marked can_fail, it won't be floated out of any case expressions.

like image 148
dfeuer Avatar answered Nov 02 '22 23:11

dfeuer