I've run into some unexpected behavior using polymorphic records in Haskell, where some values are not cached when I expect them to be cached.
Here is a minimal example:
{-# LANGUAGE RankNTypes #-}
import Debug.Trace
-- Prints out two "hello"s
data Translation = Trans { m :: forall a . Floating a => a }
g :: Floating a => a -> a
g x = x + 1
f :: Floating a => a -> a
f x = trace "hello" $ x - 2.0
-- Only one "hello"
-- data Translation = Trans { m :: Float }
--
-- f :: Float -> Float
-- f x = trace "hello" $ x - 2.0
main :: IO ()
main = do
let trans = Trans { m = f 1.5 }
putStrLn $ show $ m trans
putStrLn $ show $ m trans
In the example, I thought if the value f 1.5
was computed and stored in the field m
, on the next time it is accessed, it would not be computed again. However, it seems to be recomputed on every access to the record field, as shown by the fact that "hello" is printed twice.
On the other hand, if we remove the polymorphism from the field, the value is cached as expected, and "hello" is only printed once.
I suspect this is due to the interaction of typeclasses (being treated as records) preventing memoization. However, I don't fully understand why.
I realized that compiling with -O2 makes the problem go away, however, this behavior occurs in a much larger system where compiling with -O2 does not seem to have any effect, therefore I'd like to understand the root cause of the problem, so I can fix the performance issues in the larger system.
Hold my beer.
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ConstraintKinds #-}
import Debug.Trace
data Dict c where Dict :: c => Dict c
-- An isomorphism between explicit dictionary-passing style (Dict c -> a)
-- and typeclass constraints (c => a) exists:
from :: (c => a) -> (Dict c -> a)
from v Dict = v
to :: (Dict c -> a) -> (c => a)
to f = f Dict
data Translation = Trans { m :: forall a . Floating a => a }
f1, f2 :: Dict (Floating a) -> a -> a
f1 = trace "hello" $ \Dict x -> x - 2.0
f2 = \Dict -> trace "hello" $ \x -> x - 2.0
main = do
let trans1 = Trans { m = to (flip f1 1.5) }
trans2 = Trans { m = to (flip f2 1.5) }
putStrLn "trans1"
print (m trans1)
print (m trans1)
putStrLn "trans2"
print (m trans2)
print (m trans2)
Take a second to predict what this will output before you run it. Then go ask your GHC if she agrees with your guess.
Clear as mud?
The basic distinction you need to draw here is right here in this significantly simplified example:
> g = trace "a" $ \() -> trace "b" ()
> g ()
a
b
()
> g ()
b
()
There is a separate notion of caching a function and caching its output. The latter is, simply, never done in GHC (though see discussion of what's going on with your optimized version below). The former may sound dumb, but it in fact is not so dumb as you might think; you could imagine writing a function which is, say, id
if the collatz conjecture is true and not
otherwise. In such a situation, it makes complete sense to only test the collatz conjecture once, and then cache whether we should behave as id
or not
forever afterwards.
Once you understand this basic fact, the next leap you must believe is that in GHC, typeclass constraints are compiled to functions. (The arguments to the function are typeclass dictionaries telling how each of the typeclass' methods behave.) GHC itself manages constructing and passing these dictionaries around for you, and in most cases it's quite transparent to the user.
But the upshot of this compilation strategy is this: a polymorphic but typeclass-constrained type is a function even if it doesn't appear to have function arrows in it. That is,
f 1.5 :: Floating a => a
looks like a plain old value; but in fact it is a function which takes a Floating a
dictionary and produces a value of type a
. So any computations that go into computing the value a
are redone afresh each time this function is applied (read: used at a specific monomorphic type) because, after all, the precise value chosen depends critically on how the typeclass' methods behave.
This leaves only the question of why optimizations changed things in your situation. There I believe what happened is called "specialization", in which the compiler will try to notice when polymorphic things get used at a statically-known monomorphic type and make a binding for that. It goes something like this:
-- starting point
main = do
let trans = \dict -> trace "hello" $ minus dict (fromRational dict (3%2)) (fromRational dict (2%1))
print (trans dictForDouble)
print (trans dictForDouble)
-- specialization
main = do
let trans = \dict -> trace "hello" $ minus dict (fromRational dict (3%2)) (fromRational dict (2%1))
let transForDouble = trans dictForDouble
print transForDouble
print transForDouble
-- inlining
main = do
let transForDouble = trace "hello" $ minus dictForDouble (fromRational dict (3%2)) (fromRational dictForDouble (2%1))
print transForDouble
print transForDouble
In this last one the function-ness is gone; it is "as if" GHC has cached the output of trans
when applied to the dictionary dictForDouble
. (If you compile with optimizations and -ddump-simpl
you will see it goes even further, doing constant-propagation to turn the minus ...
stuff into just D# -0.5##
. Whew!)
{-# LANGUAGE RankNTypes #-}
import Debug.Trace
--Does not get cached
data Translation = Trans { m :: forall a. Floating a => a }
f :: Floating a => a -> a
f x = trace "f" $ x - 2.0
Since a
is a rigid type variable bound by a type expected by the context
forall a. Floating a => a
you would have to cache the context as well
--Does get cached
data Translation' = Trans' { m' :: Float }
f' :: Float -> Float
f' x = trace "f'" $ x - 2.0
Since this is a value of type Float
it can be computed once and cached afterwards.
main :: IO ()
main = do
let
trans = Trans { m = f 1.5 }
trans' = Trans' { m' = f' 1.5}
putStrLn $ show $ (m trans :: Double)
putStrLn $ show $ (m trans :: Float)
-- ^ you can evaluate it with 2 different contexts
putStrLn $ show $ (m' trans' :: Float)
putStrLn $ show $ (m' trans' :: Float)
-- ^ context fixed
Note that the former one does not get cached whether compiler optimization is turned on or off.
When they are both Float
and you turn on optimization the problem is gone.
If you compile the larger system with optimization and it is to inefficient on some metric I would suspect that the problem lies somewhere else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With