Just curious which is more efficient/better in swift:
This is perhaps better explained through an example:
var one = Object()
var two = Object()
var three = Object()
func firstFunction() {
let tempVar1 = //calculation1
one = tempVar1
let tempVar2 = //calculation2
two = tempVar2
let tempVar3 = //calculation3
three = tempVar3
}
func seconFunction() {
var tempVar = //calculation1
one = tempVar
tempVar = //calculation2
two = tempVar
tempVar = //calculation3
three = tempVar
}
Which of the two functions is more efficient? Thank you for your time!
Not to be too cute about it, but the most efficient version of your code above is:
var one = Object()
var two = Object()
var three = Object()
That is logically equivalent to all the code you've written since you never use the results of the computations (assuming the computations have no side-effects). It is the job of the optimizer to get down to this simplest form. Technically the simplest form is:
func main() {}
But the optimizer isn't quite that smart. But the optimizer really is smart enough to get to my first example. Consider this program:
var one = 1
var two = 2
var three = 3
func calculation1() -> Int { return 1 }
func calculation2() -> Int { return 2 }
func calculation3() -> Int { return 3 }
func firstFunction() {
let tempVar1 = calculation1()
one = tempVar1
let tempVar2 = calculation2()
two = tempVar2
let tempVar3 = calculation3()
three = tempVar3
}
func secondFunction() {
var tempVar = calculation1()
one = tempVar
tempVar = calculation2()
two = tempVar
tempVar = calculation3()
three = tempVar
}
func main() {
firstFunction()
secondFunction()
}
Run it through the compiler with optimizations:
$ swiftc -O -wmo -emit-assembly x.swift
Here's the whole output:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 9
.globl _main
.p2align 4, 0x90
_main:
pushq %rbp
movq %rsp, %rbp
movq $1, __Tv1x3oneSi(%rip)
movq $2, __Tv1x3twoSi(%rip)
movq $3, __Tv1x5threeSi(%rip)
xorl %eax, %eax
popq %rbp
retq
.private_extern __Tv1x3oneSi
.globl __Tv1x3oneSi
.zerofill __DATA,__common,__Tv1x3oneSi,8,3
.private_extern __Tv1x3twoSi
.globl __Tv1x3twoSi
.zerofill __DATA,__common,__Tv1x3twoSi,8,3
.private_extern __Tv1x5threeSi
.globl __Tv1x5threeSi
.zerofill __DATA,__common,__Tv1x5threeSi,8,3
.private_extern ___swift_reflection_version
.section __TEXT,__const
.globl ___swift_reflection_version
.weak_definition ___swift_reflection_version
.p2align 1
___swift_reflection_version:
.short 1
.no_dead_strip ___swift_reflection_version
.linker_option "-lswiftCore"
.linker_option "-lobjc"
.section __DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
.long 0
.long 1088
Your functions aren't even in the output because they don't do anything. main
is simplified to:
_main:
pushq %rbp
movq %rsp, %rbp
movq $1, __Tv1x3oneSi(%rip)
movq $2, __Tv1x3twoSi(%rip)
movq $3, __Tv1x5threeSi(%rip)
xorl %eax, %eax
popq %rbp
retq
This sticks the values 1, 2, and 3 into the globals, and then exits.
My point here is that if it's smart enough to do that, don't try to second-guess it with temporary variables. It's job is to figure that out. In fact, let's see how smart it is. We'll turn off Whole Module Optimization (-wmo
). Without that, it won't strip the functions, because it doesn't know whether something else will call them. And then we can see how it writes these functions.
Here's firstFunction()
:
__TF1x13firstFunctionFT_T_:
pushq %rbp
movq %rsp, %rbp
movq $1, __Tv1x3oneSi(%rip)
movq $2, __Tv1x3twoSi(%rip)
movq $3, __Tv1x5threeSi(%rip)
popq %rbp
retq
Since it can see that the calculation methods just return constants, it inlines those results and writes them to the globals.
Now how about secondFunction()
:
__TF1x14secondFunctionFT_T_:
pushq %rbp
movq %rsp, %rbp
popq %rbp
jmp __TF1x13firstFunctionFT_T_
Yes. It's that smart. It realized that secondFunction()
is identical to firstFunction()
and it just jumps to it. Your functions literally could not be more identical and the optimizer knows that.
So what's the most efficient? The one that is simplest to reason about. The one with the fewest side-effects. The one that is easiest to read and debug. That's the efficiency you should be focused on. Let the optimizer do its job. It's really quite smart. And the more you write in nice, clear, obvious Swift, the easier it is for the optimizer to do its job. Every time you do something clever "for performance," you're just making the optimizer work harder to figure out what you've done (and probably undo it).
Just to finish the thought: the local variables you create are barely hints to the compiler. The compiler generates its own local variables when it converts your code to its internal representation (IR). IR is in static single assignment form (SSA), in which every variable can only be assigned one time. Because of this, your second function actually creates more local variables than your first function. Here's function one (create using swiftc -emit-ir x.swift
):
define hidden void @_TF1x13firstFunctionFT_T_() #0 {
entry:
%0 = call i64 @_TF1x12calculation1FT_Si()
store i64 %0, i64* getelementptr inbounds (%Si, %Si* @_Tv1x3oneSi, i32 0, i32 0), align 8
%1 = call i64 @_TF1x12calculation2FT_Si()
store i64 %1, i64* getelementptr inbounds (%Si, %Si* @_Tv1x3twoSi, i32 0, i32 0), align 8
%2 = call i64 @_TF1x12calculation3FT_Si()
store i64 %2, i64* getelementptr inbounds (%Si, %Si* @_Tv1x5threeSi, i32 0, i32 0), align 8
ret void
}
In this form, variables have a %
prefix. As you can see, there are 3.
Here's your second function:
define hidden void @_TF1x14secondFunctionFT_T_() #0 {
entry:
%0 = alloca %Si, align 8
%1 = bitcast %Si* %0 to i8*
call void @llvm.lifetime.start(i64 8, i8* %1)
%2 = call i64 @_TF1x12calculation1FT_Si()
%._value = getelementptr inbounds %Si, %Si* %0, i32 0, i32 0
store i64 %2, i64* %._value, align 8
store i64 %2, i64* getelementptr inbounds (%Si, %Si* @_Tv1x3oneSi, i32 0, i32 0), align 8
%3 = call i64 @_TF1x12calculation2FT_Si()
%._value1 = getelementptr inbounds %Si, %Si* %0, i32 0, i32 0
store i64 %3, i64* %._value1, align 8
store i64 %3, i64* getelementptr inbounds (%Si, %Si* @_Tv1x3twoSi, i32 0, i32 0), align 8
%4 = call i64 @_TF1x12calculation3FT_Si()
%._value2 = getelementptr inbounds %Si, %Si* %0, i32 0, i32 0
store i64 %4, i64* %._value2, align 8
store i64 %4, i64* getelementptr inbounds (%Si, %Si* @_Tv1x5threeSi, i32 0, i32 0), align 8
%5 = bitcast %Si* %0 to i8*
call void @llvm.lifetime.end(i64 8, i8* %5)
ret void
}
This one has 6 local variables! But, just like the local variables in the original source code, this tells us nothing about final performance. The compiler just creates this version because it's easier to reason about (and therefore optimize) than a version where variables can change their values.
(Even more dramatic is this code in SIL (-emit-sil
), which creates 16 local variables for function 1 and 17 for function 2! If the compiler is happy to invent 16 local variables just to make it easier for it to reason about 6 lines of code, you certainly shouldn't be worried about the local variables you create. They're not just a minor concern; they're completely free.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With