Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the Swift compiler optimize empty closures?

Tags:

swift

I'm curious to know whether the Swift 1.2 compiler optimizes empty closures. Are the following two statements equivalent?

1:

self.presentViewController(alertController, animated: true) {}

2:

self.presentViewController(alertController, animated: true, completion: nil)

Thanks!

like image 864
Josh Brown Avatar asked Jun 12 '15 14:06

Josh Brown


1 Answers

As @rickster suggested, I had a look at the generated x86 assembly of this file (simple.swift):

func thingWithClosure(a: Int, b: (() -> Void)?) {
    println(a)
    b?()
}


thingWithClosure(3) {
    println("i'm a closure")
}

thingWithClosure(5, nil)

thingWithClosure(2) {}

My assembly is pretty rusty, but I can kinda squint at it a bit...

The .main section of the unoptimized generated x86 assembly, or at least part of it, looks like this:

    callq   _swift_once
    movq    __TZvOSs7Process11_unsafeArgvGVSs20UnsafeMutablePointerGS0_VSs4Int8__@GOTPCREL(%rip), %rax
    movq    -64(%rbp), %rcx
    movq    %rcx, (%rax)
    leaq    l_metadata+16(%rip), %rdi
    movl    $32, %r9d
    movl    %r9d, %eax
    movl    $7, %r9d
    movl    %r9d, %edx
    movq    %rax, %rsi
    movq    %rdx, -80(%rbp)
    movq    %rax, -88(%rbp)
    callq   _swift_allocObject
    leaq    __TF6simpleU_FT_T_(%rip), %rcx
    movq    %rcx, 16(%rax)
    movq    $0, 24(%rax)
    leaq    __TPA__TTRXFo__dT__XFo_iT__iT__(%rip), %rcx
    movq    %rcx, -16(%rbp)
    movq    %rax, -8(%rbp)
    movq    -16(%rbp), %rsi
    movl    $3, %r9d
    movl    %r9d, %edi
    movq    %rax, %rdx
        --> callq   __TF6simple16thingWithClosureFTSiGSqFT_T___T_
    movq    $0, -24(%rbp)
    movq    $0, -32(%rbp)
    movl    $5, %r9d
    movl    %r9d, %edi
    movq    -72(%rbp), %rsi
    movq    -72(%rbp), %rdx
        --> callq   __TF6simple16thingWithClosureFTSiGSqFT_T___T_
    leaq    l_metadata2+16(%rip), %rdi
    movq    -88(%rbp), %rsi
    movq    -80(%rbp), %rdx
    callq   _swift_allocObject
    leaq    __TF6simpleU0_FT_T_(%rip), %rcx
    movq    %rcx, 16(%rax)
    movq    $0, 24(%rax)
    leaq    __TPA__TTRXFo__dT__XFo_iT__iT__3(%rip), %rcx
    movq    %rcx, -48(%rbp)
    movq    %rax, -40(%rbp)
    movq    -48(%rbp), %rsi
    movl    $2, %r9d
    movl    %r9d, %edi
    movq    %rax, %rdx
        --> callq   __TF6simple16thingWithClosureFTSiGSqFT_T___T_
    xorl    %eax, %eax
    addq    $96, %rsp
    popq    %rbp
    retq
    .cfi_endproc

I've pointed out where the function is called with -->. Looking a few instructions up from each callq instruction, you can see where the a argument is moved into the r9d register.

Similarly, the optimized output:

    callq   _swift_once
    movq    __TZvOSs7Process11_unsafeArgvGVSs20UnsafeMutablePointerGS0_VSs4Int8__@GOTPCREL(%rip), %rax
    movq    %r14, (%rax)
    movq    $3, -24(%rbp)
    movq    __TMdSi@GOTPCREL(%rip), %rbx
    addq    $8, %rbx
    leaq    -24(%rbp), %rdi
    movq    %rbx, %rsi
    callq   __TFSs7printlnU__FQ_T_
    leaq    L___unnamed_1(%rip), %rax
    movq    %rax, -48(%rbp)
    movq    $13, -40(%rbp)
    movq    $0, -32(%rbp)
    movq    __TMdSS@GOTPCREL(%rip), %rsi
    addq    $8, %rsi
    leaq    -48(%rbp), %rdi
        --> callq   __TFSs7printlnU__FQ_T_
    movq    $5, -56(%rbp)
    leaq    -56(%rbp), %rdi
    movq    %rbx, %rsi
        --> callq   __TFSs7printlnU__FQ_T_
    movq    $2, -64(%rbp)
    leaq    -64(%rbp), %rdi
    movq    %rbx, %rsi
        --> callq   __TFSs7printlnU__FQ_T_
    xorl    %eax, %eax
    addq    $48, %rsp
    popq    %rbx
    popq    %r14
    popq    %rbp
    retq
    .cfi_endproc

Here, the compiler has inlined the function, so I've pointed out the println calls with --> instead.

I took an intro to x86 assembly using an emulated 16-bit cpu years ago, so I'm not going to pretend I know exactly what's going on here, but it appears to me that when compiled with -O, the compiler does emit roughly equivalent code (in terms of instruction count, but maybe not in terms of memory look-ups, etc). It seems like the calls to println are interspersed with leaq (load effective address) instructions, so we could be jumping all over the place, but I'm not sure where (could be more instructions? could be loading static data?), or if it matters.

The unoptimized version emits noticeably more instructions for the nil parameter case, so the major difference may be debug performance.

Of course, this is x86, so it may be entirely different on ARM.... Perhaps the ARM assembly, LLVM IR, or Swift IR outputs would shed more light?

If anyone with a better understanding can clarify, I'll gladly update this answer.

like image 68
Ralfonso Avatar answered Oct 21 '22 06:10

Ralfonso