Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling very large constants with GHC

Tags:

Today I asked GHC to compile an 8MB Haskell source file. GHC thought about it for about 6 minutes, swallowing almost 2GB of RAM, and then finally gave up with an out-of-memory error.

[As an aside, I'm glad GHC had the good sense to abort rather than floor my whole PC.]

Basically I've got a program that reads a text file, does some fancy parsing, builds a data structure and then uses show to dump this into a file. Rather than include the whole parser and the source data in my final application, I'd like to include the generated data as a compile-time constant. By adding some extra stuff to the output from show, you can make it a valid Haskell module. But GHC apparently doesn't enjoy compiling multi-MB source files.

(The weirdest part is, if you just read the data back, it actually doesn't take much time or memory. Strange, considering that both String I/O and read are supposedly very inefficient...)

I vaguely recall that other people have had trouble with getting GHC to compile huge files in the past. FWIW, I tried using -O0, which speeded up the crash but did not prevent it. So what is the best way to include large compile-time constants in a Haskell program?

(In my case, the constant is just a nested Data.Map with some interesting labels.)

Initially I thought GHC might just be unhappy at reading a module consisting of one line that's eight million characters long. (!!) Something to do with the layout rule or such. Or perhaps that the deeply-nested expressions upset it. But I tried making each subexpression a top-level identifier, and that was no help. (Adding explicit type signatures to each one did appear to make the compiler slightly happier, however.) Is there anything else I might try to make the compiler's job simpler?

In the end, I was able to make the data-structure I'm actually trying to store much smaller. (Like, 300KB.) This made GHC far happier. (And the final application much faster.) But for future reference, I'd be interested to know what the best way to approach this is.