Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: partially drop lazy evaluated results

I have a very large decision tree. It is used as follows:

-- once per application start
t :: Tree
t = buildDecisionTree
-- done several times
makeDecision :: Something -> Decision
makeDecision something = search t something

This decision tree is way too large to fit in memory. But, thanks to lazy evaluation, it is only partially evaluated.

The problem is, that there are scenarios where all possible decisions are tried causing the whole tree to be evaluated. This is not going to terminate, but should not cause a memory overflow either. Further, if this process is aborted, the memory usage does not decrease, as a huge subtree is still evaluated already.

A solution would be to reevaluate the tree every time makeDecision is called, but this would loose the benefits of caching decisions and significantly slow down makeDecision.

I would like to go a middle course. In particular it is very common in my application to do successive decisions with common path prefix in the tree. So I would like to cache the last used path but drop the others, causing them to reevaluate the next time they are used. How can I do this in Haskell?

like image 401
ipsec Avatar asked Jan 18 '13 08:01

ipsec


People also ask

Does Haskell use lazy evaluation?

Haskell uses lazy evaluation for all functions. On the plus side, lazy evaluation never does more reduction steps than eager evaluation, and sometimes many fewer. On the minus side, lazy evaluation can use a lot more memory in some cases, and it can be difficult to predict its performance ahead of time.

Why Haskell is lazy evaluation?

Haskell is a lazy language. It does not evaluate expressions until it absolutely must. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory. There are ways of introducing strictness into our programs when we don't want lazy evaluation.

Does Haskell support lazy processing?

Haskell uses a special form of evaluation called lazy evaluation. In lazy evaluation, no code is evaluated until it's needed. In the case of longList , none of the values in the list were needed for computation.

Why is lazy evaluation useful?

The benefits of lazy evaluation include: The ability to define control flow (structures) as abstractions instead of primitives. The ability to define potentially infinite data structures. This allows for more straightforward implementation of some algorithms.


1 Answers

It is not possible in pure haskell, see question Can a thunk be duplicated to improve memory performance? (as pointed out by @shang). You can, however, do this with IO.

We start with the module heade and list only the type and the functions that should make this module (which will use unsafePerformIO) safe. It is also possible to do this without unsafePerformIO, but that would mean that the user has to keep more of his code in IO.

{-# LANGUAGE ExistentialQuantification #-}
module ReEval (ReEval, newReEval, readReEval, resetReEval) where

import Data.IORef
import System.IO.Unsafe

We start by defining a data type that stores a value in a way that prevents all sharing, by keeping the function and the argument away from each other, and only apply the function when we want the value. Note that the value returned by unsharedValue can be shared, but not with the return value of other invocations (assuming the function is doing something non-trivial):

data Unshared a = forall b. Unshared (b -> a) b

unsharedValue :: Unshared a -> a
unsharedValue (Unshared f x) = f x

Now we define our data type of resettable computations. We need to store the computation and the current value. The latter is stored in an IORef, as we want to be able to reset it.

data ReEval a = ReEval {
    calculation :: Unshared a,
    currentValue :: IORef a
    }

To wrap a value in a ReEval box, we need to have a function and an argument. Why not just a -> ReEval a? Because then there would be no way to prevent the parameter to be shared.

newReEval :: (b -> a) -> b -> ReEval a
newReEval f x = unsafePerformIO $ do
    let c = Unshared f x
    ref <- newIORef (unsharedValue c)
    return $ ReEval c ref

Reading is simple: Just get the value from the IORef. This use of unsafePerformIO is safe becuase we will always get the value of unsharedValue c, although a different “copy” of it.

readReEval :: ReEval a -> a
readReEval r = unsafePerformIO $ readIORef (currentValue r)

And finally the resetting. I left it in the IO monad, not because it would be any less safe than the other function to be wrapped in unsafePerformIO, but because this is the easiest way to give the user control over when the resetting actually happens. You don’t want to risk that all your calls to resetReEval are lazily delayed until your memory has run out or even optimized away because there is no return value to use.

resetReEval :: ReEval a -> IO ()
resetReEval r = writeIORef (currentValue r) (unsharedValue (calculation r))

This is the end of the module. Here is example code:

import Debug.Trace
import ReEval
main = do
    let func a = trace ("func " ++ show a) negate a
    let l = [ newReEval func n | n <- [1..5] ]
    print (map readReEval l)
    print (map readReEval l)
    mapM_ resetReEval l
    print (map readReEval l)

And here you can see that it does what expected:

$ runhaskell test.hs 
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
[-1,-2,-3,-4,-5]
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
like image 140
Joachim Breitner Avatar answered Oct 24 '22 12:10

Joachim Breitner