Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is mapM in Haskell strict? Why does this program get a stack overflow?

The following program terminates correctly:

import System.Random

randomList = mapM (\_->getStdRandom (randomR (0, 50000::Int))) [0..5000]

main = do
  randomInts <- randomList
  print $ take 5 randomInts

Running:

$ runhaskell test.hs
[26156,7258,29057,40002,26339]

However, feeding it with an infinite list, the program never terminates, and when compiled, eventually gives a stack overflow error!

import System.Random

randomList = mapM (\_->getStdRandom (randomR (0, 50000::Int))) [0..]

main = do
  randomInts <- randomList
  print $ take 5 randomInts

Running,

$ ./test
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

I expected the program to lazily evaluate getStdRandom each time I pick an item off the list, finishing after doing so 5 times. Why is it trying to evaluate the whole list?

Thanks.

Is there a better way to get an infinite list of random numbers? I want to pass this list into a pure function.

EDIT: Some more reading revealed that the function

randomList r = do g <- getStdGen
                  return $ randomRs r g

is what I was looking for.

EDIT2: after reading camccann's answer, I realized that getStdGen is getting a new seed on every call. Instead, better to use this function as a simple one-shot random list generator:

import System.Random

randomList :: Random a => a -> a -> IO [a]
randomList r g = do s <- newStdGen
                    return $ randomRs (r,g) s

main = do r <- randomList 0 (50::Int)
          print $ take 5 r

But I still don't understand why my mapM call did not terminate. Evidently not related to random numbers, but something to do with mapM maybe.

For example, I found that the following also does not terminate:

randomList = mapM (\_->return 0) [0..]

main = do
  randomInts <- randomList
  print $ take 50000 randomInts

What gives? By the way, IMHO, the above randomInts function should be in System.Random. It's extremely convenient to be able to very simply generate a random list in the IO monad and pass it into a pure function when needed, I don't see why this should not be in the standard library.

like image 444
Steve Avatar asked Jul 29 '10 01:07

Steve


2 Answers

Random numbers in general are not strict, but monadic binding is--the problem here is that mapM has to sequence the entire list. Consider its type signature, (a -> m b) -> [a] -> m [b]; as this implies, what it does is first map the list of type [a] into a list of type [m b], then sequence that list to get a result of type m [b]. So, when you bind the result of applying mapM, e.g. by putting it on the right-hand side of <-, what this means is "map this function over the list, then execute each monadic action, and combine the results back into a single list". If the list is infinite, this of course won't terminate.

If you simply want a stream of random numbers, you need to generate the list without using a monad for each number. I'm not entirely sure why you've used the design you have, but the basic idea is this: Given a seed value, use a pseudo-random number generator to produce a pair of 1) a random number 2) a new seed, then repeat with the new seed. Any given seed will of course provide the same sequence each time. So, you can use the function getStdGen, which will provide a fresh seed in the IO monad; you can then use that seed to create an infinite sequence in completely pure code.

In fact, System.Random provides functions for precisely that purpose, randoms or randomRs instead of random and randomR.

If for some reason you want to do it yourself, what you want is essentially an unfold. The function unfoldr from Data.List has the type signature (b -> Maybe (a, b)) -> b -> [a], which is fairly self-explanatory: Given a value of type b, it applies the function to get either something of type a and a new generator value of type b, or Nothing to indicate the end of the sequence.

You want an infinite list, so will never need to return Nothing. Thus, partially applying randomR to the desired range and composing it with Just gives this:

Just . randomR (0, 50000::Int) :: (RandomGen a) => a -> Maybe (Int, a)

Feeding that into unfoldr gives this:

unfoldr (Just . randomR (0, 50000::Int)) :: (RandomGen a) => a -> [Int]

...which does exactly as it claims: Given an instance of RandomGen, it will produce an infinite (and lazy) list of random numbers generated from that seed.

like image 178
C. A. McCann Avatar answered Nov 17 '22 13:11

C. A. McCann


I would do something more like this, letting randomRs do the work with an initial RandomGen:

#! /usr/bin/env runhaskell

import Control.Monad
import System.Random


randomList :: RandomGen g => g -> [Int]
randomList = randomRs (0, 50000)

main :: IO ()
main = do
   randomInts <- liftM randomList newStdGen
   print $ take 5 randomInts

As for the laziness, what's happening here is that mapM is (sequence . map)

Its type is: mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

It's mapping the function, giving a [m b] and then needs to execute all those actions to make an m [b]. It's the sequence that'll never get through the infinite list.

This is explained better in the answers to a prior question: Is Haskell's mapM not lazy?

like image 6
dino Avatar answered Nov 17 '22 12:11

dino