When is it more efficient to use CORDIC or a polynomial approximation?

Tags:

I am working on an architecture which does not feature floating point hardware, but only a 16 bit ALU and a 40 bit MAC.

I have already implemented 32-bit single precision floating point addition/subtraction, multiplication, cosine, sine, division, square root, and range reduction all in software on this architecture.

To implement cosine and sine I first used range reduction using the method described in the paper "ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by K.C. NG I then implemented a cosine and sine function which are polynomial approximations to the cosine and sine functions on the range -pi/4 to +pi/4. I referred to the book "Computer Approximations", Hart, et al. for the polynomials.

I have also heard that I should consider the CORDIC algorithm. However, I was wondering if anyone knows if it would be more or less efficient (in terms of throughput, memory overhead, and number of instructions required) than the method I already used? I have implemented my software functions on a multicore architecture where each core features only 128 words of instruction memory and a 128 word 16-bit data memory. Also I have tried searching for how to implement the CORDIC algorithm for cosine and sine, but I couldn't find any good resources for 32-bit floating point implementations. Does anybody have suggestions?

Thank you!

920

asked Mar 14 '13 18:03

Veridian

1 Answers

CORDIC gives you one bit per loop iteration, so implementing it in software will likely be slower than your polynomial version. That may also be why it is hard to find articles on software implementations of CORDIC: its performance is inferior, so nobody bothers.

Re your comment: Horner's method is the practice of evaluating polynomials from highest-order coefficient to lowest, by repeatedly adding the coefficient, then multiplying by the variable x. In contrast, the naive method (i.e., evaluating the powers of x first, then multiplying them by their coefficients and adding them together) takes more work and can be less numerically stable than Horner's method.

You haven't mentioned exactly how you're trying to evaluate your polynomials, so I will suggest a formula:

x2 = x * x
cos = ((COS_D * x2 + COS_C) * x2 + COS_B) * x2 + COS_A
sin = (((SIN_D * x2 + SIN_C) * x2 + SIN_B) * x2 + SIN_A) * x

Note that you can get better precision if you adapt your constants to the range over which you are evaluating the function, rather than using the Taylor coefficients. (Again, apologies if you have done some or all of these things, but you didn't mention what you had already tried...)

This is probably less relevant for your case (which presumably has just a 16x16-bit MAC), but if your processor can launch multiple arithmetic evaluations at once, you may be able to get better performance if you write your evaluation in a tree-like form, avoiding some of the sequential dependency of operations:

x2 = x * x
x4 = x2 * x2
cos = (COS_D * x2 + COS_C) * x4 + (COS_B * x2 + COS_A)
sin = ((SIN_D * x2 + SIN_C) * x4 + (SIN_B * x2 + SIN_A)) * x

If your processor has a vector ALU, this formula also suggests a productive use for it...

answered Nov 03 '22 01:11

comingstorm

Related questions
                            
                                Use memory region as stack space?
                            
                                valgrind | Invalid read of size 8 | Address 0x7a41270 is 0 bytes inside a block of size 4 alloc'd
                            
                                Debugging memory corruption
                            
                                can I have a C macro that accepts undefined number of parameters? [duplicate]
                            
                                Casting enum definition to unsigned int
                            
                                Malloc and array index confusion in C
                            
                                OMP - more threads than the number of processors?
                            
                                Apparently I'm corrupting the stack - but how?
                            
                                Is there a way create an array with a variable length in c?
                            
                                Run-Time Check Failure #2 - Stack around the variable was corrupted
                            
                                Interview : Hash function: sine function
                            
                                What is the datatype of a defined constant?
                            
                                Decimal to binary algorithm in C
                            
                                Why doesn't CHECK_FUNCTION_EXISTS find clock_gettime in CMake?
                            
                                Why does this code prevent gcc & llvm from tail-call optimization?
                            
                                C: typedef union
                            
                                Make a checksum of the current stack
                            
                                What is the need of defining an Enum/Struct by way of macros?
                            
                                How do I load my own library dynamically, and invoke a method in it?
                            
                                c pointers and array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When is it more efficient to use CORDIC or a polynomial approximation?

Tags:

c

algorithm

math

floating-point

assembly

Veridian

People also ask

1 Answers

comingstorm

Recent Activity

Donate For Us