Consider following example:
safeMapM f xs = safeMapM' xs []
where safeMapM' [] acc = return $ reverse acc
safeMapM' (x:xs) acc = do y <- f x
safeMapM' xs (y:acc)
mapM return largelist -- Causes stack space overflow on large lists
safeMapM return largelist -- Seems to work fine
Using mapM
on large lists causes a stack space overflow while safeMapM
seems to work fine (using GHC 7.6.1 with -O2
). However I was not able to find a function similar to safeMapM
in the Haskell standard libraries.
Is it still considered good practice to use mapM
(or sequence
for that matter)?
If so, why is it considered to be good practice despite the danger of stack space overflows?
If not which alternative do you suggest to use?
As Niklas B., the semantics of mapM
are those of an effectful right fold, and it terminates successfully in more cases than a flipped version. In general, mapM
makes more sense, as it is rare that we would want to do a result-yielding map on an enormous list of data. More commonly, we'll want to evaluate such a list for effects, and in that case mapM_
and sequence_
, which throw away the results, are typically what are recommended.
Edit: in other words, despite the issue raised in the question, yes, mapM
and sequence
are commonly used and typically considered good practice.
If so, why is it considered to be good practice despite the danger of stack space overflows? If not which alternative do you suggest to use?
If you want to process the list elements as they are generated, use either pipes
or conduit
. Both will never build up an intermediate list.
I'll show the pipes
way, since that is my library. I'll first begin with an infinite list of numbers generated in the IO
monad from user input:
import Control.Proxy
infiniteInts :: (Proxy p) => () -> Producer p Int IO r
infiniteInts () = runIdentityP $ forever $ do
n <- lift readLn
respond n
Now, I want to print them as they are generated. That requires defining a downstream handler:
printer :: (Proxy p) => () -> Consumer p Int IO r
printer () = runIdentityP $ forever $ do
n <- request ()
lift $ print n
Now I can connect the Producer
and Consumer
using (>->)
, and run the result using runProxy
:
>>> runProxy $ infiniteInts >-> printer
4<Enter>
4
7<Enter>
7
...
That will then read Int
s from the user and echo them back to the console as they are generated without saving more than a single element in memory.
So usually if you want an effectful computation that generates a stream of elements and consumes them immediately, you don't want mapM
. Use a proper streaming library.
If you want to learn more about pipes
, then I recommend reading the tutorial.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With