What functions can I use in Accelerate.framework
to scale a vector by a scalar, and normalize a vector? I found one I think might work for scaling in the documentation but I am confused about it's operation.
vDSP_vsma
Vector scalar multiply and vector add; single precision.
void vDSP_vsma (
const float *__vDSP_A,
vDSP_Stride __vDSP_I,
const float *__vDSP_B,
const float *__vDSP_C,
vDSP_Stride __vDSP_K,
float *__vDSP_D,
vDSP_Stride __vDSP_L,
vDSP_Length __vDSP_N
);
The easiest way to normalize a vector in-place is something like
int n = 3;
float v[3] = {1, 2, 3};
cblas_sscal(n, 1.0 / cblas_snrm2(n, v, 1), v, 1);
You'll need to
#include <cblas.h>
or
#include <vblas.h>
(or both). Note that several of the functions are in the "matrix" section when they operate on vectors.
If you want to use the vDSP functions, see the Vector-Scalar Division section. There are several things you can do:
vDSP_dotpr()
, sqrt()
, and vDSP_vsdiv()
vDSP_dotpr()
, vrsqrte_f32()
, and vDSP_vsmul()
(vrsqrte_f32()
is a NEON GCC built-in, though, so you need to check you're compiling for armv7).vDSP_rmsqv()
, multiply by sqrt(n)
, and vDSP_vsdiv()
The reason why there isn't a vector-normalization function is because the "vector" in vDSP means "lots of things at once" (up to around 4096
/8192
) and necessarily the "vector" from linear algebra. It's pretty meaningless to normalize a 1024
-element vector, and a quick function for normalizing a 3
-element vector isn't something that will make your app significantly faster, which is why there isn't one.
The intended usage of vDSP is more like normalizing 1024
2
- or 3
-element vectors. I can spot a handful of ways to do this:
vDSP_vdist()
to get a vector of lengths, followed by vDSP_vdiv()
. You have to use vDSP_vdist()
multiple times for vectors of length greater than 2, though.vDSP_vsq()
to square all the inputs, vDSP_vadd()
multiple times to add all of them, the equivalent of vDSP_vsqrt()
or vDSP_vrsqrt()
, and vDSP_vmul()
or vDSP_vdiv()
as appropriate. It shouldn't be too hard to write the equivalent of vDSP_vsqrt()
or vDSP_vrsqrt()
.Of course, if you don't have 1024 vectors to normalize, don't overcomplicate things.
Notes:
32K
for around a decade or more (they may be shared between virtual cores in a hyperthreaded CPU and some older/cheaper processors might have 16K), so the most you should do is around 8192
for in-place operation on floats. You might want to subtract a little for stack space, and if you're doing several sequential operations you probably want to keep it all in cache; 1024
or 2048
seem pretty sensible and any more will probably hit diminishing returns. If you care, measure performance...If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With