Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up floating point operations (Android ARMv6)

I'm doing some image compression in Android using native code. For various reasons, I can't use a pre-built library.

I profiled my code using the android-ndk-profiler and found that the bottleneck is -- surprisingly -- floating point operations! Here's the profile output:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 40.37      0.44     0.44                             __addsf3
 11.93      0.57     0.13     7200     0.02     0.03  EncodeBlock
  6.42      0.64     0.07   535001     0.00     0.00  BitsOut
  6.42      0.71     0.07                             __aeabi_fdiv
  6.42      0.78     0.07                             __gnu_mcount_nc
  5.50      0.84     0.06                             __aeabi_fmul
  5.50      0.90     0.06                             __floatdisf
  ...

I googled __addsf3 and apparently it is a software floating point operation. Yuck. I did more research on the ARMv6 architecture core, and unless I missed something, it doesn't have hardware floating point support. So what can I do here to speed this up? Fixed-point? I know that's normally done with integers, but I'm not really sure how to convert my code to do that. Is there a compiler flag I could set so it will do that? Other suggestions welcome.

like image 560
Nick Avatar asked Dec 27 '22 22:12

Nick


2 Answers

Of course you can do anything with integer arithmetic only (after all is exactly what you program is doing right now) but if it can be done faster or not really depends on what exactly you are trying to do.

Floating point is sort of a generic solution can you can apply in most cases and just forget about it, but it's somewhat rare that your problem really needs numbers ranging wildly from the incredibly small to the incredibly big and with 52 bits of mantissa accuracy. Supposing your computations are about graphics with a double precision floating point number you can go from much less than sub-atomic scale to much more than the size of the universe... is it really that range needed? Accuracy provided of course depends on the scale with FP, but what is the accuracy you really need?

What are your numbers used for in your "inner loop"? Without knowing that is hard to say if the computation can be made faster by much or not. Almost surely it can be made faster (FP is a generic blind solution) but the degree of gain you may hope in varies a lot. I don't know the specific implementation but I'd expect it to be reasonably efficient (for the generic case).

You should aim at an higher logical level of optimization.

For image (de)compression based on say DCT or wavelet transform I think that indeed there is no need of floating point arithmetic: you can just consider the exact scales your number will be and use integer arithmetic. Moreover may be you also have an extra degree of freedom because of the ability of produce approximate results.

like image 160
6502 Avatar answered Jan 14 '23 12:01

6502


See 6502's excellent answer first...

Most processors dont have fpus because they are not needed. And when they do for some reason they try to conform to IEEE754 which is equally unnecessary, the cases that need any of that are quite rare. The fpu is just an integer alu with some stuff around it to keep track of the floating point, all of which you can do yourself.

How? Lets think decimals and dollars we can think about $110.50 and adding $0.07 and getting $110.57 or you could have just done everything in pennies, 11050 + 7 = 11057, then when you print it for a user place a dot in the right place. That is all the fpu is doing, and that is all you need to do. this link may or may not give some insight into this http://www.divms.uiowa.edu/~jones/bcd/divide.html

Dont blanket all ARMv6 processors that way, that is not how ARMs are categorized. Some cores have the option for an FPU or you can add one on yourself after you buy from ARM, etc. the ARM11's are ARMv6 with the option for an fpu for example.

Also, just because you can keep track of the decimal point yourself, if there is a hard fpu it is possible to have it be faster than doing it yourself in fixed point. Likewise it is possible and easy to not know how to use an fpu and get bad results, just get them faster. Very easy to write bad floating point code. Whether you use fixed or float you need to keep track of the range of your numbers and from that control where you move the point around to keep the integer math at the core within the mantissa. Which means to use floating point effectively you should be thinking in terms of what the integer math is doing. One very common mistake is to think that multiplies mess up your precision, when it is actually addition and subtraction that can hurt you the most.

like image 26
old_timer Avatar answered Jan 14 '23 13:01

old_timer