I am playing with C and Swift 3.0 code using vecLib and Accelerate framework from Apple as dynamic lib + my code in C lang based project and Swift playground.
And in situation with calling Apple's wrapper from framework of SIMD instruction with 1 or < 4 elements computation function like vvcospif()
from framework is slower than simple standart cos(x * PI)
when functions calls from loop near 1.000 times as example.
I know about difference between vvcospif()
and cos()
, I should use exactly vvcospif()
for x * PI
.
Example in playground, you can just copy code and run it:
import Cocoa
import Accelerate
func cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
let ft: Float = alpha * 3.1415927;
let f: Float = (1 - cos(ft)) * 0.5;
return a + f*(b - a);
}
var start: Date = NSDate() as Date
var interp: Float;
for index in 0..<1000 {
interp = cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
var end = NSDate();
var timeInterval: Double = end.timeIntervalSince(start);
print("cosine_interpolate in \(timeInterval) seconds")
func fast_cosine_interpolate(alpha: Float, a: Float, b: Float) -> Float {
var x: Float = alpha
var count: Int32 = 1
var result: Float = 0
vvcospif(&result, &x, &count)
let SINSIN_HALF_X: Float = (1 - result) * 0.5;
return a + SINSIN_HALF_X * (b - a);
}
start = NSDate() as Date
for index in 0..<1000 {
interp = fast_cosine_interpolate(alpha: 0.25, a: 1.0, b: 0.75)
}
end = NSDate();
timeInterval = end.timeIntervalSince(start);
print("fast_cosine_interpolate in \(timeInterval) seconds")
My question is:
vvcospif()
is slow in this example?May be because vvcospif()
it is wrapper under Objective-C runtime and converting data structures / copying of memory from Intel SIMD -> Objective-C -> Swift runtime is slower then tiny cos()
?
#include <Accelerate/Accelerate.h>
vvcospif(resultVector, inputVector, &count);
when inputVector
and resultVector
is small arrays with 1 or 2 elements or just float variable, and calls in loop with ~ 1.000.000 times.
cos(x * PI)
computation time near 20 ms.
and
vvcospif(x)
with processing one float
or float array[2]
- computation time near 80 ms! Where is Acceleration? :)
Yes, in Xcode I use compiler -O -whole-module-optimization
optimisation with whole module opt. enabled.
For a more detailed discussion with examples, see "Introduction to Fast Bezier (and Trying the Accelerate.framework)".
The first, fundamental problem is that non-inlined function calls are extremely expensive. You don't want function calls if you can possibly help it in performance-critical code. Within a module, the compiler can often inline functions for you, and parts of stdlib can be inlined for you. But when you start crossing module barriers, Swift generally cannot optimize out the call.
The point of SIMD functions is that you set up all your data in the right format, and then call them just one time. That way the cost of the function call is made up by the SIMD optimized code you're calling.
But remember, you don't have to call into Accelerate to get SIMD optimizations. The compiler is perfectly capable of noticing you've written a loop and turning it into an inline SIMD algorithm itself (and it does this all the time). So for many simple problems, the compiler is going to win anyway. Think about it: if calling vvcospif
with a count of 1 were faster than calling cos
, wouldn't they just implement cos
that way?
I haven't played with your code much, but if you want to improve its performance with Accelerate, you want to think about how to arrange all your input data so you can call vvcospif
one time with a large N. It's quite possible in that case that it will be much faster that a loop (since cos
is not trivial).
If you want an example of Accelerate in practice, and how you need to organize your data, see PinchText. This code is computing offsets for a page full of a few thousand glyphs based on up to 10 touches in real-time, with animations (see PinchText.mov for what the result looks like). In particular look at adjustViewPositions:count:forTouchPoint:
. Notice how count
is large, and the data is transformed step by step with no loops. Even throwing in a (very expensive) ObjC method call into that method doesn't matter very much because it's only made one time. Getting rid of function calls in loops is a huge part of performance programming.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With