Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Baffled by the Arrow in Do Statements in Haskell

Tags:

I am working on understanding the State monad and have written two simple versions of the famous fibonacci to memoize the function. The one with let in the body runs very slowly. The one with <- runs very fast. My question: Why? Is let causing full evaluation somehow while <- is allowing the M.lookup via Data.Map to work?

-- using state monad and let  -- very slow  fibLet :: Int -> State (M.Map Int Integer) Integer fibLet n = do    case n of 0 -> return 0              1 -> return 1              n -> do                    mp <- get                   if M.member n mp                    then return $ fromJust (M.lookup n mp)                   else do                        let s1 = evalState (fibLet (n - 1)) mp                                                let s2 = evalState (fibLet (n - 2)) mp                        let s3 = s1+s2                        modify $ M.insert n s3                        return s3   fibonacciLet :: Int -> Integer fibonacciLet n = evalState (fibLet n) M.empty 
-- using state monad and <-  -- very fast  fibArrow :: Int -> State (M.Map Int Integer) Integer fibArrow n = do     case n of 0 -> return 0               1 -> return 1               n -> do                    mp <- get                    if M.member n mp                    then return $ fromJust (M.lookup n mp)                    else do                         s1 <-fibArrow (n - 1)                         s2 <-fibArrow (n - 2)                         let s3 = s1+s2                         modify $ M.insert n s3                         return s3   fibonacciArrow :: Int -> Integer fibonacciArrow n = evalState (fibArrow n) M.empty 
like image 862
dsagman Avatar asked Oct 12 '21 22:10

dsagman


1 Answers

fibLet isn't actually using its state very much at all! That's why it's so slow; it's essentially a very complicated way to write the classic Haskell fibonacci definition:

fib 0 = 0 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2) 

Look carefully what's going on here:

-- from earlier: mp <- get do    let s1 = evalState (fibLet (n - 1)) mp                            let s2 = evalState (fibLet (n - 2)) mp    let s3 = s1+s2    modify $ M.insert n s3    return s3 

mp is the Map of currently-known results. You use evalState to run fibLet (n - 1), starting its state with mp. Then you use evalState again to run fibLet (n - 2), starting its state with mp again. This means fibLet (n - 1) and fibLet (n - 2) aren't sharing any work; each of them is using the same starting map of already known results mp, so anything that isn't already in that map is going to have to be computed by both branches.

However it's actually even worse than that. Look at the type of evalState:

evalState :: State s a -> s -> a 

It has no State in its return type. That means it isn't actually stateful. It does not interact with a any surrounding state; effectively it starts off a new state thread, runs it to completion1, after which the state is discarded.

You can tell that really easily by looking at the type of evalState slightly differently (but equivalently):

evalState :: State s a -> (s -> a) 

evalState turns a State s a into a simple function typed s -> a. Obviously a plain old function can't alter the implicit state of a do block its called from (that's the whole point of making statefulness explicit in the types). So that means:

evalState (fibLet (n - 1)) :: M.Map Int Integer -> Integer 

evalState (fibLet (n - 1)) is just a plain old function from maps to integers. It neither accesses nor affects state of any kind.

So that means that after let s1 = evalState (fibLet (n - 1)) mp the state in the outer do block is still exactly equal to mp. Nothing has been saved in our state map of already computed results. So not only are we not sharing any work between the two separate recursive calls, they're not even sharing work within the deeper layers of recursion inside each call!

To prove it, try running fibLet using runState instead of evalState, so you can see what the final map is:

ghci> runState (fibLet 20) M.empty (6765,fromList [(20,6765)]) it :: (Integer, M.Map Int Integer) 

There's only a single entry in the map, added as the last step of the do block in the outermost call to fibLet, just before it returned.

If you do the same thing with fibArrow you get this:

ghci> runState (fibArrow 20) M.empty (6765,fromList [(2,1),(3,2),(4,3),(5,5),(6,8),(7,13),(8,21),(9,34),(10,55),(11,89),(12,144),(13,233),(14,377),(15,610),(16,987),(17,1597),(18,2584),(19,4181),(20,6765)]) it :: (Integer, M.Map Int Integer) 

You can clearly see it's memoized all the intermediate results, not just the final one.

Bottom line: You don't usually want to use evalState (and similar functions like runState and execState) inside a function that's operating in the State monad. Those functions are intended for running the whole stateful computation, and so are normally run from "outside" a state monad context. When you happen to run them "inside" a state monad context, they do not interact with it; they instead receive (and return, in the case of runState and execState) state through ordinary argument passing and function return, not through the implicit state threading provided by State.

If you want to make a call to a "stateful sub-computation" (i.e. call a State _ _ inside a function returning State _ _), then you need to bind the sub-computation as part of the outer one. The arrow statement inside do blocks does this; the let statement doesn't, it only gives a name to an ordinary expression. That's why you found yourself having to use evalState to get a result and explicitly feed mp even though you're already inside a stateful context where it should be available implicitly; the messiness of that was a hint that something wasn't right.


1 Well, it "runs it to completion" assuming that the result value is fully demanded.

like image 93
Ben Avatar answered Jan 01 '23 05:01

Ben