Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

will the result of `let` and `where` expressions be stored in haskell?

I am rather new to Haskell and after reading this and some performance tips on strictness I am still wondering how this applies to let and where expressions. If I have code like:

f :: Int -> Int -> Int
f a b
  |a==b = <simple computation>
  |otherwise = e1 + 2 * e1 - e1^2
  where e1 = <lengthy computation>

how often will <lengthy computation> be evaluated? I assume that given Haskell's lazy evaluation in e1 will not be evaluated at all if a==b. But if not, is e1 substituted in the otherwise expression and then evaluated every time it is encountered or is it evaluated once it is first encountered and then stored and reused in all subsequent occurrences? Also:

  • is there a way to control this process "manually"?
  • does this depend on weather I run code in ghci or compile it with GHC and within GHC compilation does it depend on flags like -o?

This is quite similar to this question but I could not find answers for Haskell.

Explanations are very appreciated.

like image 704
Chris Avatar asked May 02 '17 12:05

Chris


People also ask

How does let work in Haskell?

In Haskell let, binding is used to bind the variables, which are very local. In Haskell, we can define any type of variable using let keyword before the variable name in Haskell. But their scope is local, we also have let in Haskell which is another form of defining the variable and use them after it.

Do Haskell functions terminate?

Termination of Haskell Functions Intuitively a specific function evaluation (where the value of every argument is supplied) is terminating if the Haskell evaluation strategy needs finitely many steps to compute the result completely.

What does () mean in Haskell?

() is very often used as the result of something that has no interesting result. For example, an IO action that is supposed to perform some I/O and terminate without producing a result will typically have type IO () .

Does Haskell have stack overflow?

There is no call stack in Haskell. Instead we find a pattern matching stack whose entries are essentially case expressions waiting for their scrutinee to be evaluated enough that they can match a constructor (WHNF).


2 Answers

As a rule, code in the where or let block of a constant applicative form is evaluated only once, and only as deep as necessary (i.e., if it's not used at all it also won't be evaluated at all).

f is not a constant applicative form because it has arguments; it's equivalent to

f' = \a b -> let e1 = <lengthy computation>
             in if a==b
                 then <simple computation>
                 else e1 + 2 * e1 - e1^2

So, e1 is evaluated once every time you call the function with both arguments. This is likely also what you want, and in fact the best behaviour possible if <lengthy computation> depends on both a and b. If it, say, only depends on a, you can do better:

f₂ a = \b -> 
  if a==b then <simple computation>
           else e1 + 2 * e1 - e1^2
 where e1 = <lengthy computation>

This form will be more efficient when you do e.g. map (f 34) [1,3,9,2,9]: in that example, e1 would only be computed once for the entire list. (But <lengthy computation> won't have b in scope, so it can't depend on it.)

OTOH, there can also be scenarios where you don't want e1 to be kept at all. (E.g. if it occupies a lot of memory, but is rather quick to compute). In this case, you can just make it a “nullary function”

f₃ a b
  | a==b = <simple computation>
  | otherwise = e1() + 2 * e1() - e1()^2
 where e1 () = <lengthy computation>

Functions are not memoized by default, so in the above, <lengthy computation> is done zero times if a==b and three times else.

Yet another possibility is to force that e1 is always evaluated exactly once. You can do that with seq:

f₄ a b = e1 `seq` if a==b
          then <simple computation>
          else e1 + 2 * e1 - e1^2
 where e1 = <lengthy computation>

This is the only of the suggestions that actually changes something about the semantics, not just the performance: assume we define always e1 = error "too tough". Then f, f', f₂ and f₃ will all still work provided that a==b; however f₄ will even fail in that case.


As for optimisations (-O or -O2) – these generally won't change anything about the strictness properties of your program (i.e. can't change between the behaviour of f and f₄). Beyond that, the compiler is pretty much free to make any change it considers benefitial to performance. But usually, it will not change anything about what I said above. The main exception, as Taren remarks, is something like f₃: the compiler will readily inline e1 () and then share a reference to the computed value, which prevents the garbage collector from reclaiming the memory. So it's better not to rely on this (anyway somewhat hackish) technique.

like image 67
leftaroundabout Avatar answered Sep 22 '22 00:09

leftaroundabout


You can check actually check out how GHC will optimize your code:

ghc -ddump-simpl -dsuppress-idinfo -dsuppress-coercions -dsuppress-type-applications -dsuppress-uniques -dsuppress-module-prefixes -fforce-recomp .\scratch.hs

This is a bit of a mouthful so you might want to alias it. The results of this very much depend on the optimization level so you might want to try this out with each.

With g i = sum [1..i] as expensive computation and -O2 I get this output:

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 64, types: 23, coercions: 0}

Rec {
-- RHS size: {terms: 16, types: 3, coercions: 0}
$wgo :: Int# -> Int# -> Int#
$wgo =
  \ (w :: Int#) (ww :: Int#) ->
    case w of wild {
      __DEFAULT -> $wgo (+# wild 1#) (+# ww wild);
      10000# -> +# ww 10000#
    }
end Rec }

-- RHS size: {terms: 15, types: 1, coercions: 0}
f2 :: Int
f2 =
  case $wgo 1# 0# of ww { __DEFAULT ->
  I# (-# (+# ww (*# 2# ww)) (*# ww ww))
  }

-- RHS size: {terms: 2, types: 0, coercions: 0}
f1 :: Int
f1 = I# 42#

-- RHS size: {terms: 17, types: 8, coercions: 0}
f :: Int -> Int -> Int
f =
  \ (a :: Int) (b :: Int) ->
    case a of _ { I# x ->
    case b of _ { I# y ->
    case tagToEnum# (==# x y) of _ {
      False -> f2;
      True -> f1
    }
    }
    }

Which is pretty ugly when compared to your haskell version but with a bit of squinting it isn't much more complicated. $wgo is our expensive function. The interesting part here is that f1 or f2, the possible return values of f, are only computed once when they are required for the first time. For the rest of the program run they are reused.

like image 39
Taren Avatar answered Sep 25 '22 00:09

Taren