Haskell pointless performance - efficiently map multiple functions to the same data

Question

I often have to map multiple functions to the same data. I have implemented dpMap to do this for me

dpMap fns = (`map` fns) . flip ($)

dpMap is one function, does this mean I read the data dt just once (like a circuit fed with just the same input. A pointless system reminds of a circuit; just the plumbing no data)?

As an example consider calculating the minimum and maximum of a list dt.

minimax dt = (dpMap [minimum, maximum]) dt

(I could get rid of the dt but have to use -XNoMonomorphismRestriction)

Is there a performance advantage over implementing the same function in a point-full form like this?:

minimax2 dt = [minimum dt, maximum dt]

EDIT: Is there a general implementation of dpMap which works with constant memory?

I found another nice blog post: http://www.haskellforall.com/2013/08/composable-streaming-folds.html ;hope this helps.

~~EDIT2: After some more context, here is a solution, even though I don't have an exact implementation of dpMap, the pattern is simple enough that it doesn't warrant a separate function:~~

minimax = (,) <$> minimum <*> maximum

Usage:

> minimax [1..100]
(1,100)

If you want to also calculate the sum and the length

func = (,,,) <$> minimum <*> maximum <*> sum <*> length

Usage:

> func [1..100]
(1,100,5050,100)

Will Ness · Accepted Answer

TL;DR: There are no guarantees about performance in the language itself. None whatsoever. It is a compiler thing.

As the rule of thumb, a named entity will be memory resident. If it is accessed lazily by only one consumer, it is reasonable to expect it to be optimized such that the compiled program will run in constant memory.

The creation and consumption of memory cells will be interleaved, and each cell will be GC-ed after it was processed.

In minimax2 dt = [minimum dt, maximum dt], the expression [minimum dt, maximum dt] is inside the scope where the named entity dt is defined. Most probably (i.e. almost certain) that GHC will allocate it as a memory entity, i.e. once, and both dt inside the expression will refer to the same entity (point to it, as if pointers).

But as Cat Plus Plus points out in the comments, of course how is entity is accessed is quite another matter. And the two sub-expressions will each access it separately, i.e. it will be retained in memory in full. That's no good.

We can do better, and find our answer by accessing it only once, with a fold, collecting the two pieces of data as we go along. In such situation, it is almost certain that GHC will perform an optimization where this list will not be retained in memory as a whole.

This is what is usually refered to as the list being consumed lazily. When that's the case, its creation will be interleaved with that one access, and each memory cell produced will be immediately consumed and released, by GC (garbage collection), so that constant memory operation will be achieved.

But that's predicated on our ability to scan through the list only once:

{-# OPTIONS_GHC -O2 -XBangPatterns #-}

import Data.List (foldl')

minmax :: (Ord b) => [b] -> (b, b)
minmax (x:xs) = foldl' (\(!a,!b) x -> (min a x,max b x)) (x,x) xs

Bang patterns prevent thunk build-up, making evaluation of arguments more eager. Testing:

Prelude> minmax [1..6]
(1,6)
Prelude> minmax []
*** Exception: <interactive>:1:4-65: Non-exhaustive patterns in function minmax

An empty list of course has no minimum nor maximum defined.

For the optimizations to kick in, -O2 flag must be used when compiling with GHC.

Haskell pointless performance - efficiently map multiple functions to the same data

Tags:

functional-programming

haskell

pointfree

GeneralBecos

1 Answers

Will Ness

Recent Activity

Donate For Us

Haskell pointless performance - efficiently map multiple functions to the same data

Tags:

functional-programming

haskell

pointfree

GeneralBecos

1 Answers

Will Ness

Related questions

Recent Activity

Donate For Us