To measure the performance of those Refs, I dumped the assembly produced by GHC on the following code :
import Data.IORef
main = do
r <- newIORef 18
v <- readIORef r
print v
I expected the IORef to be completely optimized away, leaving only a syscall to write stdout with string "18". Instead I get 250 lines of assembly. Do you know how many will actually be executed ? Here is what I think is the heart of the program :
.globl Main.main1_info
Main.main1_info:
_c1Zi:
leaq -8(%rbp),%rax
cmpq %r15,%rax
jb _c1Zj
_c1Zk:
movq $block_c1Z9_info,-8(%rbp)
movl $Main.main2_closure+1,%ebx
addq $-8,%rbp
jmp stg_newMutVar#
_c1Zn:
movq $24,904(%r13)
jmp stg_gc_unpt_r1
.align 8
.long S1Zo_srt-(block_c1Z9_info)+0
.long 0
.quad 0
.quad 30064771104
block_c1Z9_info:
_c1Z9:
addq $24,%r12
cmpq 856(%r13),%r12
ja _c1Zn
_c1Zm:
movq 8(%rbx),%rax
movq $sat_s1Z2_info,-16(%r12)
movq %rax,(%r12)
movl $GHC.Types.True_closure+2,%edi
leaq -16(%r12),%rsi
movl $GHC.IO.Handle.FD.stdout_closure,%r14d
addq $8,%rbp
jmp GHC.IO.Handle.Text.hPutStr2_info
_c1Zj:
movl $Main.main1_closure,%ebx
jmp *-8(%r13)
I am concerned about this jmp stg_newMutVar#
. It is nowhere else in the assembly, so maybe GHC resolves it at a later linking stage. But why is it even here and what does it do ? Can I dump the final assembly without any unresolved haskell symbols ?
Starting with a couple of links:
MutVar
object definition.cmm
code for newMutVar
.The cmm
and C
sources aren't particularly readable if you're not already familiar with the macros and primops. Unfortunately, I don't know of a good way to view the assembly generated for cmm
primops, short of looking into an executable with objdump or some other disassembler.
Still, I can summarize the runtime semantics of IORef
.
IORef
is a wrapper around MutVar#
from GHC.Prim
. As the doc says, MutVar#
is like a single-element mutable array. It takes up two machine words, the first is the header, the second is the stored value (which is a pointer to a GHC object). A value of MutVar#
is itself a pointer to this two-word object.
MutVar
-s differ from normal immutable objects most notably by participating in a write barrier mechanism. GHC has generational garbage collection, so any MutVar
that lives in an older generation must be also a GC root when collecting the younger generations, since mutating a MutVar
may cause younger objects to become reachable. Therefore, whenever a MutVar
is promoted from generation 0 (the youngest), it is added to a so-called "mutable list" that contains references to all such mutable objects. The mutable list gets rebuilt during GC of old generations. In short, MutVar
-s in old generations are always present on the mutable list.
This is a rather simplistic way of dealing with mutable variables, and if we have large numbers of them in old generations, minor garbage collection slows down because of the bloated mutable list, and as a result the entire program slows down.
Since mutable variables aren't used prominently in production code, there hasn't been much demand or pressure for optimizing the RTS for their heavy usage.
If you need a large number of mutable variables, you should instead use a single mutable boxed array, because that's only a single reference on the mutable list and also has a bitmap-based optimization for GC traversal of elements that might have been mutated.
Also, as you see newMutVar#
is only statically linked but not inlined, although it's a rather small chunk of code. As a result, it's also not optimized away. This is again broadly because of the lack of effort and attention for optimizing mutating code. By contrast, allocating and copying small known-sized primitive arrays is currently inlined and greatly optimized, because Johan Tibell who did large amount of work implementing the unordered-containers
library made it so (in order to make unordered-containers
faster).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With