Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell space usage compile time restrictions

Tags:

haskell

I quite like Haskell, but space leaks are a bit of a concern for me. I usually think Haskell's type system makes it safer than C++, however with a C-style loop I can be fairly certain it will complete without running out of memory, whereas a Haskell "fold" can run out of memory unless you're careful that the appropriate fields are strict.

I was wondering if there's a library that uses the Haskell type system to ensure various constructs can be compiled and run in a way that doesn't build up thunks. For example, no_thunk_fold would throw a compiler error if one was using it in a way that could build up thunks. I understand this may restrict what I can do, but I'd like a few functions I can use as an option which would make me more confident I haven't accidentally left an important unstrict field somewhere and that I'm going to run out of space.

like image 616
Clinton Avatar asked Mar 13 '13 00:03

Clinton


2 Answers

It sounds like you are worried about some of the down sides of lazy evaluation. You want to ensure your fold, loop, recursion is handled in constant memory.

The iteratee libraries were created solve this problem, pipes, conduit, enumerator, iteratee, iterIO.

The most popular and also recent are pipes and conduit. Both of which go beyond the iteratee model.

The pipes library focuses on being theoretically sound in an effort to eliminate bugs and to allow the constancy of design open up efficient yet high levels of abstraction(my words not the authors). It also offers bidirectional streams if desired which is a benefit so far unique to the library.

The conduit is not quite as theoretically as well founded as pipes but has the large benefit of currently having more associated libraries built on it for parsing and handling http streams, xml streams and more. Check out the conduit section at hackage in on the packages page. It is used yesod one of Haskell's larger and well known web frameworks.

I have enjoyed writing my streaming applications with pipes library in particular the ability to make proxy transformer stacks. When I have needed to fetch a web page or parse some xml I have been using the conduit libraries.

I should also mention io-streams which just did its first official release. It's aim is at IO in particular, no surprise it is in its name, and utilizing simpler type machinery, fewer type parameters, then pipes or conduit. The major down side is that you are stuck in the IO monad so it is not very helpful to pure code.

{-# language NoMonoMorphismRestriction #-}                                       
import Control.Proxy

Start with simple translation.

map (+1) [1..10]

becomes:

runProxy $ mapD (+1) <-< fromListS [1..10]

The iteratee like offerings a little more verbose for simple translations, but offer large wins with larger examples.

A example of a proxy, pipes library, that generates fibonacci numbers in constant sapce

fibsP = runIdentityK $ (\a -> do respond 1                                       
                                 respond 1                                       
                                 go 1 1)                                         
  where                                                                          
    go fm2 fm1 = do  -- fm2, fm1 represents fib(n-2) and fib(n-1)                                                            
        let fn = fm2 + fm1                                                       
        respond fn -- sends fn downstream                                                              
        go fm1 fn

These could streamed to the stdout with runProxy $ fibsP >-> printD -- printD prints only the downstream values, Proxies are the bidirectional offer of the pipes package.

You should check out the proxy tutorial and the conduit tutorial which I just found out is now at FP Complete's school of Haskell.

One method to find the mean would be:

> ((_,l),s) <- (`runStateT` 0) $ (`runStateT` 0) $ runProxy $  foldlD' ( flip $ const (+1)) <-< raiseK (foldlD' (+)) <-< fromListS [1..10::Int]
> let m = (fromIntegral . getSum) s / (fromIntegral . getSum) l
5.5

Now it is easy to add map or filter the proxy.

> ((_,l),s) <- (`runStateT` 0) $ (`runStateT` 0) $ runProxy $  foldlD' ( flip $ const (+1)) <-< raiseK (foldlD' (+)) <-< filterD even <-< fromListS [1..10::Int]

edit: code rewritten to take advantage of the state monad.

update:

On more method of doing multiple calculation over a large stream of data in a compassable fashion then writing direct recursion is demonstrated in the blog post beautiful folding. Folds are turned into data and combined while using a strict accumulator. I have not used this method with any regularity, but it does seem to isolate where strictness is required making it easier to apply. You should also look at an answer to another question similar question that implements the same method with applicative and may be easier to read depending on your predilections.

like image 104
Davorak Avatar answered Nov 08 '22 09:11

Davorak


Haskell's type system can't do that. We can prove this with a fully polymorphic term to eat arbitrary amounts of ram.

takeArbitraryRAM :: Integer -> a -> a
takeArbitraryRAM i a = last $ go i a where
  go n x | n < 0 = [x]
  go n x | otherwise = x:go (n-1) x

To do what you want requires substructural types. Linear logic corresponds to an efficiently computable fragment of the lambda calculus (you would also need to control recursion though). Adding the structure axioms allows you to take super exponential time.

Haskell lets you fake linear types for the purposes of managing some resources using index monads. Unfortunately space and time are baked in to the language, so you can't do that for them. You can do what is suggested in a comment, and use a Haskell DSL to generate code that has performance bounds, but computing terms in this DSL could take arbitrary long and use arbitrary space.

Don't worry about space leaks. Catch them. Profile. Reason about your code to prove complexity bounds. This stuff you just have to do no matter what language you are using.

like image 44
Philip JF Avatar answered Nov 08 '22 11:11

Philip JF