Consider this function that doubles all the elements in a list:
doubleMe [] = [] doubleMe (x:xs) = (2*x):(doubleMe xs)
Then consider the expression
doubleMe (doubleMe [a,b,c])
It seems obvious that, at runtime, this first expands to:
doubleMe ( (2*a):(doubleMe [b,c]) )
(It's obvious because no other possibilities exist as far as I can see).
But my question is this: Why exactly does this now expand to
2*(2*a) : doubleMe( doubleMe [b,c] )
instead of
doubleMe( (2*a):( (2*b) : doubleMe [c] ) )
?
Intuitively, I know the answer: Because Haskell is lazy. But can someone give me a more precise answer?
Is there something special about lists that causes this, or is the idea more general than that just lists?
In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations (sharing).
Haskell is a lazy language. It does not evaluate expressions until it absolutely must. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory. There are ways of introducing strictness into our programs when we don't want lazy evaluation.
"Lazy" evaluation is performing operations when and as they are needed. It is useful when it is a feature of a programming language or library because it is generally harder to implement lazy evaluation on your own than simply to precalculate everything up front.
Lazy evaluation means that Spark does not evaluate each transformation as they arrive, but instead queues them together and evaluate all at once, as an Action is called. The benefit of this approach is that Spark can make optimization decisions after it had a chance to look at the DAG in entirety.
doubleMe (doubleMe [a,b,c])
does not expand to doubleMe ( (2*a):(doubleMe [b,c]) )
. It expands to:
case doubleMe [a,b,c] of [] -> [] (x:xs) -> (2*x):(doubleMe xs)
That is the outer function call is expanded first. That's the main difference between a lazy language and a strict one: When expanding a function call, you don't first evaluate the argument - instead you replace the function call with its body and leave the argument as-is for now.
Now the doubleMe
needs to be expanded because the pattern matching needs to know the structure of its operand before it can be evaluated, so we get:
case (2*a):(doubleMe [b,c]) of [] -> [] (x:xs) -> (2*x):(doubleMe xs)
Now the pattern matching can be replaced with the body of the second branch because we now know that the second branch is the one that matches. So we substitute (2*a)
for x
and doubleMe [b, c]
for xs
, giving us:
(2*(2*a)):(doubleMe (doubleMe [b,c]))
So that's how we arrive at that result.
Your “obvious” first step isn't actually quite so obvious. In fact what happens is rather like this:
doubleMe (...) doubleMe ( { [] | (_:_) }? ) doubleMe ( doubleMe (...)! )
and only at that point does it actually “enter” the inner function. So it proceeds
doubleMe ( doubleMe (...) ) doubleMe ( doubleMe( { [] | (_:_) }? ) ) doubleMe ( doubleMe( a:_ ! ) ) doubleMe ( (2*a) : doubleMe(_) ) doubleMe ( (2*a):_ ! )
now here the outer doubleMe
function has the “answer” to it's [] | (_:_)
question, which was the only reason anything in the inner function was evaluated at all.
Actually, the next step is also not necessarily what you though: it depends on how you evaluate the outer result! For instance, if the whole expression was tail $ doubleMe ( doubleMe [a,b,c] )
, then it would actually expand more like
tail( { [] | (_:_) }? ) tail( doubleMe(...)! ) tail( doubleMe ( { [] | (_:_) }? ) ) ... tail( doubleMe ( doubleMe( a:_ ! ) ) ) tail( doubleMe ( _:_ ) ) tail( _ : doubleMe ( _ ) ) doubleMe ( ... )
i.e. it would in fact never really get to 2*a
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With