Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up runhaskell

Tags:

haskell

ghc

I have a small test framework. It executes a loop which does the following:

  1. Generate a small Haskell source file.

  2. Execute this with runhaskell. The program generates various disk files.

  3. Process the disk files just generated.

This happens a few dozen times. It turns out that runhaskell is taking up the vast majority of the program's execution time.

On one hand, the fact that runhaskell manages to load a file from disk, tokenise it, parse it, do dependency analysis, load 20KB more text from disk, tokenise and parse all of this, perform complete type inference, check types, desugar to Core, link against compiled machine code, and execute the thing in an interpreter, all inside of 2 seconds of wall time, is actually pretty damned impressive when you think about it. On the other hand, I still want to make it go faster. ;-)

Compiling the tester (the program that runs the above loop) produced a tiny performance difference. Compiling the 20KB of library code that the scripts link against produced a rather more noticeable improvement. But it's still taking about 1 second per invocation of runhaskell.

The generated Haskell files are just over 1KB each, but only one part of the file actually changes. Perhaps compiling the file and using GHC's -e switch would be faster?

Alternatively, maybe it's the overhead of repeatedly creating and destroying many OS processes which is slowing this down? Every invocation of runhaskell presumably causes the OS to explore the system search path, locate the necessary binary file, load it into memory (surely this is already in the disk cache?), link it against whatever DLLs, and fire it up. Is there some way I can (easily) keep one instance of GHC running, rather than having to constantly create and destroy the OS process?

Ultimately, I suppose there's always the GHC API. But as I understand it, that's nightmarishly difficult to use, highly undocumented, and prone to radical changes at every minor point release of GHC. The task I'm trying to perform is only very simple, so I don't really want to make things more complex than necessary.

Suggestions?

Update: Switching to GHC -e (i.e., now everything is compiled except the one expression being executed) made no measurable performance difference. It seems pretty clear at this point that it's all OS overhead. I'm wondering if I could maybe create a pipe from the tester to GHCi and thus make use of just one OS process...

like image 383
MathematicalOrchid Avatar asked Feb 17 '12 09:02

MathematicalOrchid


3 Answers

Alright, I have a solution: I created a single GHCi process and connected its stdin to a pipe, so that I can send it expressions to interactively evaluate.

Several fairly large program refactorings later, and the entire test suite now takes roughly 8 seconds to execute, rather than 48 seconds. That'll do for me! :-D

(To anyone else trying to do this: For the love of God, remember to pass the -v0 switch to GHCi, or you'll get a GHCi welcome banner! Weirdly, if you run GHCi interactively, even with -v0 the command prompt still appears, but when connected to a pipe the command prompt vanishes; I'm presuming this is a helpful design feature rather than an random accident.)


Of course, half the reason I'm going down this strange route is that I want to capture stdout and stderr to a file. Using RunHaskell, that's quite easy; just pass the appropriate options when creating the child process. But now all of the test cases are being run by a single OS process, so there's no obvious way to redirect stdin and stdout.

The solution I came up with was to direct all test output to a single file, and between tests have GHCi print out a magic string which (I hope!) won't appear in test output. Then quit GHCi, slurp up the file, and look for the magic strings so I can snip the file into suitable chunks.

like image 52
MathematicalOrchid Avatar answered Nov 15 '22 06:11

MathematicalOrchid


You might find some useful code in TBC. It has different ambitions - in particular to scrap test boilerplate and test projects that may not compile completely - but it could be extended with a watch-directory feature. The tests are run in GHCi but objects successfully built by cabal ("runghc Setup build") are used.

I developed it to test EDSLs with complicated type hackery, i.e. where the heavy computational lifting is done by other libraries.

I am presently updating it to the latest Haskell Platform and welcome any comments or patches.

like image 24
Peter Gammie Avatar answered Nov 15 '22 05:11

Peter Gammie


If the majority of the source files remain unchanged, you can possibly use GHC's -fobject-code (possibly in conjunction with -outputdir) flag to compile some of the library files.

like image 2
ivanm Avatar answered Nov 15 '22 04:11

ivanm