Compilation of IORef and STRef

Question

To measure the performance of those Refs, I dumped the assembly produced by GHC on the following code :

import Data.IORef

main = do
  r <- newIORef 18
  v <- readIORef r
  print v

I expected the IORef to be completely optimized away, leaving only a syscall to write stdout with string "18". Instead I get 250 lines of assembly. Do you know how many will actually be executed ? Here is what I think is the heart of the program :

.globl Main.main1_info
Main.main1_info:
_c1Zi:
    leaq -8(%rbp),%rax
    cmpq %r15,%rax
    jb _c1Zj
_c1Zk:
    movq $block_c1Z9_info,-8(%rbp)
    movl $Main.main2_closure+1,%ebx
    addq $-8,%rbp
    jmp stg_newMutVar#
_c1Zn:
    movq $24,904(%r13)
    jmp stg_gc_unpt_r1
.align 8
    .long   S1Zo_srt-(block_c1Z9_info)+0
    .long   0
    .quad   0
    .quad   30064771104
block_c1Z9_info:
_c1Z9:
    addq $24,%r12
    cmpq 856(%r13),%r12
    ja _c1Zn
_c1Zm:
    movq 8(%rbx),%rax
    movq $sat_s1Z2_info,-16(%r12)
    movq %rax,(%r12)
    movl $GHC.Types.True_closure+2,%edi
    leaq -16(%r12),%rsi
    movl $GHC.IO.Handle.FD.stdout_closure,%r14d
    addq $8,%rbp
    jmp GHC.IO.Handle.Text.hPutStr2_info
_c1Zj:
    movl $Main.main1_closure,%ebx
    jmp *-8(%r13)

I am concerned about this jmp stg_newMutVar#. It is nowhere else in the assembly, so maybe GHC resolves it at a later linking stage. But why is it even here and what does it do ? Can I dump the final assembly without any unresolved haskell symbols ?

András Kovács · Accepted Answer

Starting with a couple of links:

The MutVar object definition.
The cmm code for newMutVar.
A non-comprehensive but helpful summary of GHC object layout.

The cmm and C sources aren't particularly readable if you're not already familiar with the macros and primops. Unfortunately, I don't know of a good way to view the assembly generated for cmm primops, short of looking into an executable with objdump or some other disassembler.

Still, I can summarize the runtime semantics of IORef.

IORef is a wrapper around MutVar# from GHC.Prim. As the doc says, MutVar# is like a single-element mutable array. It takes up two machine words, the first is the header, the second is the stored value (which is a pointer to a GHC object). A value of MutVar# is itself a pointer to this two-word object.

MutVar-s differ from normal immutable objects most notably by participating in a write barrier mechanism. GHC has generational garbage collection, so any MutVar that lives in an older generation must be also a GC root when collecting the younger generations, since mutating a MutVar may cause younger objects to become reachable. Therefore, whenever a MutVar is promoted from generation 0 (the youngest), it is added to a so-called "mutable list" that contains references to all such mutable objects. The mutable list gets rebuilt during GC of old generations. In short, MutVar-s in old generations are always present on the mutable list.

This is a rather simplistic way of dealing with mutable variables, and if we have large numbers of them in old generations, minor garbage collection slows down because of the bloated mutable list, and as a result the entire program slows down.

Since mutable variables aren't used prominently in production code, there hasn't been much demand or pressure for optimizing the RTS for their heavy usage.

If you need a large number of mutable variables, you should instead use a single mutable boxed array, because that's only a single reference on the mutable list and also has a bitmap-based optimization for GC traversal of elements that might have been mutated.

Also, as you see newMutVar# is only statically linked but not inlined, although it's a rather small chunk of code. As a result, it's also not optimized away. This is again broadly because of the lack of effort and attention for optimizing mutating code. By contrast, allocating and copying small known-sized primitive arrays is currently inlined and greatly optimized, because Johan Tibell who did large amount of work implementing the unordered-containers library made it so (in order to make unordered-containers faster).

Compilation of IORef and STRef

Tags:

performance

haskell

ioref

V. Semeria

1 Answers

András Kovács

Recent Activity

Donate For Us

Compilation of IORef and STRef

Tags:

performance

haskell

ioref

V. Semeria

1 Answers

András Kovács

Related questions

Recent Activity

Donate For Us