Optimizing partial computation in Haskell

Question

I'm curious how to optimize this code :

fun n = (sum l, f $ f0 l, g $ g0 l)
  where l = map h [1..n]

Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.

As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.

It appears this behavior could be fixed by writing :

fun n = a `seq` b `seq` c `seq` (a, f b, g c)
  where
    l = map h [1..n]
    a = sum l
    b = inline f0 $ l
    c = inline g0 $ l

Or the very similar :

fun n = (a,b,c) `deepSeq` (a, f b, g c)
  where ...

We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?

Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?

As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?

Daniel Fischer · Accepted Answer

If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.

If you write

fun n = a `seq` b `seq` c `seq` (a, f b, g c)
  where
    l = map h [1..n]
    a = sum l
    b = inline f0 $ l
    c = inline g0 $ l

or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.

The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.

Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average

data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double

average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
  where
    ratio (P n s) = s / fromIntegral n
    count (P n s) x = P (n+1) (s+x)

but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

Optimizing partial computation in Haskell

Tags:

optimization

haskell

ghc

Jeff Burdges

1 Answers

Daniel Fischer

Recent Activity

Donate For Us

Optimizing partial computation in Haskell

Tags:

optimization

haskell

ghc

Jeff Burdges

1 Answers

Daniel Fischer

Related questions

Recent Activity

Donate For Us