Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using referential-transparency to pre-compute values in haskell

Let's say we have a program like this :

list = [1..10000000]

main = print $ sum list

I want this to be compiled such that the executable just prints 50000005000000, without taking as much time and effort.

Basically, any expression that will be surely computed (maybe the strictness analysis can help here) can be pre-computed during compilation time (ie we use referential-transparency to say it doesn't really matter when we compute the value).

In short : "has to be computed" + referential-transparency = can be pre-computed

This would be like running the program till we hit something that depends upon the input (i.e. the core of the program, that which is common across all inputs, will be pre-computed)

Is there an existing mechanism to achieve this currently (in Haskell or any other language)? [Please don't point to something like templates in C++ as it doesn't have referential-transparency in the first place.]

If not, how tough is this problem ? [What are the accompanying technical (and theoretical) issues ?]

like image 211
Karan Avatar asked Apr 10 '12 17:04

Karan


2 Answers

The general-purpose answer to doing compile-time computation is to use Template Haskell. But for this specific use case, you can use the vector package and the LLVM backend, and GHC will optimize away this sum.

sorghum:~% cat >test.hs
import Data.Vector.Unboxed as V
main = print (V.sum $ V.enumFromN (1 :: Int) 10000000)
sorghum:~% ghc --make -O2 -fllvm test.hs
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
sorghum:~% time ./test
50000005000000
./test  0.00s user 0.00s system 0% cpu 0.002 total
sorghum:~% objdump -d test | grep 2d7988896b40
  40653d:   48 bb 40 6b 89 88 79    movabs $0x2d7988896b40,%rbx
  406547:   49 be 40 6b 89 88 79    movabs $0x2d7988896b40,%r14

(In case it's not immediately obvious, 0x2d79888896b40 is 50000005000000.)

like image 195
Daniel Wagner Avatar answered Oct 08 '22 10:10

Daniel Wagner


This is not safe in the general case. The reason is that Haskell expressions may be pure, but they can also fail to terminate. The compiler should always terminate, so the best you could do would be "evaluate 1,000 steps of this result".1 But if you do add such a limit, what if you're compiling a program to run on a supercomputer cluster with terabytes of RAM, and the compiler runs out of memory?

You can add lots of limits, but in the end you'll reduce the optimisation to a slow form of constant folding (especially so for the majority of programs whose computations depend on run-time user input). And since sum [1..10000000] takes half a second here, it's unlikely it'd fall under any reasonable limit.

Of course, specific cases like this one can often be optimised away, and GHC often does optimise away complicated expressions like this. But the compiler can't just evaluate any expression at compile-time safely; it would have to be very constrained, and it's arguable how helpful it would be under those constraints. It's a compiler, not an interpreter :)

1 Which would massively slow down the compilation of any program which does contain a lot of infinite loops — which, since Haskell is non-strict, is more likely than you might think). Or, more commonly, any program which contains a lot of long-running computations.

like image 28
ehird Avatar answered Oct 08 '22 08:10

ehird