Given the program:
import Debug.Trace main = print $ trace "hit" 1 + trace "hit" 1
If I compile with ghc -O
(7.0.1 or higher) I get the output:
hit 2
i.e. GHC has used common sub-expression elimination (CSE) to rewrite my program as:
main = print $ let x = trace "hit" 1 in x + x
If I compile with -fno-cse
then I see hit
appearing twice.
Is it possible to avoid CSE by modifying the program? Is there any sub-expression e
for which I can guarantee e + e
will not be CSE'd? I know about lazy
, but can't find anything designed to inhibit CSE.
The background of this question is the cmdargs library, where CSE breaks the library (due to impurity in the library). One solution is to ask users of the library to specify -fno-cse
, but I'd prefer to modify the library.
(D) x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination.
In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a single variable holding the computed value.
Common Subexpression Elimination is an optimization that searches for instances of identical expressions, and replaces them with a single variable holding the computed value. For instance, consider the following code: a <- 1 / (8 + 8 + 1 + 9 * 1 ^ 8) b <- (8 + 8 + 1 + 9 * 1 ^ 8) * 2.
How about removing the source of the trouble -- the implicit effect -- by using a sequencing monad that introduces that effect? E.g. the strict identity monad with tracing:
data Eval a = Done a | Trace String a instance Monad Eval where return x = Done x Done x >>= k = k x Trace s a >>= k = trace s (k a) runEval :: Eval a -> a runEval (Done x) = x track = Trace
now we can write stuff with a guaranteed ordering of the trace
calls:
main = print $ runEval $ do t1 <- track "hit" 1 t2 <- track "hit" 1 return (t1 + t2)
while still being pure code, and GHC won't try to get to clever, even with -O2
:
$ ./A hit hit 2
So we introduce just the computation effect (tracing) sufficient to teach GHC the semantics we want.
This is extremely robust to compile optimizations. So much so that GHC optimizes the math to 2
at compile time, yet still retains the ordering of the trace
statements.
As evidence of how robust this approach is, here's the core with -O2
and aggressive inlining:
main2 = case Debug.Trace.trace string trace2 of Done x -> case x of I# i# -> $wshowSignedInt 0 i# [] Trace _ _ -> err trace2 = Debug.Trace.trace string d d :: Eval Int d = Done n n :: Int n = I# 2 string :: [Char] string = unpackCString# "hit"
So GHC has done everything it could to optimize the code -- including computing the math statically -- while still retaining the correct tracing.
References: the useful Eval
monad for sequencing was introduced by Simon Marlow.
Reading the source code to GHC, the only expressions that aren't eligible for CSE are those which fail the exprIsBig
test. Currently that means the Expr
values Note
, Let
and Case
, and expressions which contain those.
Therefore, an answer to the above question would be:
unit = reverse "" `seq` () main = print $ trace "hit" (case unit of () -> 1) + trace "hit" (case unit of () -> 1)
Here we create a value unit
which resolves to ()
, but which GHC can't determine the value for (by using a recursive function GHC can't optimise away - reverse
is just a simple one to hand). This means GHC can't CSE the trace
function and it's 2 arguments, and we get hit
printed twice. This works with both GHC 6.12.4 and 7.0.3 at -O2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With