I found a strange thing about GHCi and lists.
This command takes some time to execute and just returns the right answer.
ghci> length [1..10^8]
100000000
However, binding this to a variable and executing causes GHC to consume about 5 GiB of RAM without releasing until the GHCi session is over. Typing :quit
after it consumes 3 GiB more before actually exiting.
ghci> len = length [1..10^8]
ghci> len
-- Consumes 5 GiB
100000000
ghci> :quit
-- Consumes 3 GiB
-- Exits
Is this normal? What is the difference between the commands?
GHC version is 8.2.2.
Update: The optimization performed by -O0
is a little different than I first understood. Also, added a note about filing a new Trac bug.
I can reproduce this in GHC 8.2.2. Directly evaluating the expression (or using let
to bind it to a variable and then evaluating it) both complete quickly:
Prelude> length [1..10^8]
10000000 -- pretty fast
Prelude> let len = length [1..10^8]
Prelude> len
10000000 -- pretty fast
Prelude>
However, using the let
-free syntax:
Prelude> len = length [1..10^8]
Prelude> len
10000000
Prelude>
takes longer and allocates a lot of memory which isn't freed until the session ends.
Note that this is specific to GHCi and interactive mode -- in a real, compiled Haskell program, there would be no issue. Compiling the following will run quickly and not consume excess memory:
len = length [1..10^8]
main = print len
To understand what's going on, you should understand that Haskell is capable of performing two potential optimizations of this code:
When this code is compiled with optimizations (-O1
or -O2
), GHC will perform optimization #2. The compiled version will run quickly in a small amount of constant memory (a couple of megabytes resident for the runtime). If you run this with:
$ time ./Length +RTS -s
to collect statistics, you'll find that GHC is still allocating about 1.6 gigabytes of heap, but this is actually to store the individual Integer
values as they are being incremented. (Since values in Haskell are immutable, a new Integer
must be allocated for every increment.) If you force the type to be Int
:
len = length [(1::Int)..10^8]
then the program will allocate only a few kilobytes of heap, and you can see that truly there's no list being allocated.
It turns out that when this code is compiled without optimizations (-O0
), GHC only performs optimization #1 (as pointed out by @Carl), but it manages to do a really good job of it, so much so that even though the GHC statistics show lots of heap allocation, the program still runs quite quickly with a very small memory footprint.
However, when this code is compiled to byte-code in GHCi, not only is just optimization #1 used, but GHC doesn't do quite so good a job of garbage collecting the list. A huge multi-gigabyte list is generated, and the beginning is garbage collected nearly as fast as it's being generated. Memory use ends up being quite large, but at least it's relatively constant.
You can see this by turning on timing/memory stats:
> :set +s
> length [1..10^8]
100000000
(1.54 secs, 7,200,156,128 bytes)
>
This means that this code actually allocates a 7.2 gigabyte list; fortunately, it can be thrown away almost as fast as it's generated, so the memory in use by the GHCi process after this computation will still be reasonably modest.
You'll see that:
> let len = length [1..10^8]
> len
and:
> len = length [1..10^8]
> len
chew through exactly the same enormous amount of memory (about 7.2 gigs).
The difference is that, for some reason, the let
version allows the list to be garbage collected as it's counted, and the non-let
version doesn't.
In the end, this is almost certainly a GHCi bug. It might be related to one of the existing space-leak bugs that have been reported (e.g., Trac #12848 or #14336), or maybe it's a new one. I decided to file it as #14789, so maybe someone will take a look at it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With