Certain DSP type workloads seem to show very significant performance improvement on Intel x86 x86_64 processors, when linked against Intel IPP library.
Wondering if there is something similar on ARM side ? Especially that might work accross ARM9, ARM11 and Cortex-A8/A9's (not necessarily with the same level of performance boost).
Finally, this following question, might not be right/acceptable here, so mods pls be kind to leave a comment and I can edit it out.
I've been trying to read the License Agreement of IPP, but it is not clear if the commercial IPP license on Linux, selling for US$199 + taxes, entitles one to a single personal copy (but for possibly commercial use) of the library, or can one link their application against this library and sell it for commercial gain ? Or does that need a different kind of license ? Couldn't figure out a place on intel site to ask this question (nothing like Contact sales) !
There is also ARM sponsored open source project Ne10 which initially covers a small set of floating-point, vector arithmetic, and matrix manipulation functions.
There are several answers to your question, depending on how you look at it.
Intel IPP is a library with many pre-cooked functions to do common tasks like fast-fourier-transforms and such. There are specific libraries in the open source community that do the same, look at:
and many others. Not all of these libraries come with optimizations for the various ARM cores.
The second angle to your question is why you want something that works across significantly different ARM cores. On Cortex A family processors, you have the (optional!) ARM NEON SIMD instructions that (like MMX/SSE/AltiVec) can take a set of data at once and apply several operations to it. This reduces the amount of instructions needed to process an amount of data. The ARM11xx family has something similar but much more restricted called VFP3. The ARM9xx family is really lacking this kind of optimizations. Apart from that, the ARM architecture has Thumb and Thumb2 that can result in smaller and faster code.
The end result is that optimized libraries that really run across a multitude of ARM cores, will need to have several implementations of the same algorithm for different workloads. This will increase the library size. Are you willing to pay that price?
On iOS there is the accelerate framework that is optimized for ARM using SIMD where available. See Apple's documentation here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With