Rounding up integer without using float, double, or division

Tags:

Its an embedded platform thats why such restrictions.

original equation: 0.02035*c*c - 2.4038*c

Did this:

int32_t val = 112; // this value is arbitrary
int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = result>>24;

The precision is still poor. When we multiply val*0x535A8 Is there a way we can further improve the precision by rounding up, but without using any float, double, or division.

862

asked Jan 17 '14 17:01

SandBag_1996

2 Answers

The problem is not precision. You're using plenty of bits.

I suspect the problem is that you're comparing two different methods of converting to int. The first is a cast of a double, the second is a truncation by right-shifting.

Converting floating point to integer simply drops the fractional part, leading to a round towards zero; right-shifting does a round down or floor. For positive numbers there's no difference, but for negative numbers the two methods will be 1 off from each other. See an example at http://ideone.com/rkckuy and some background reading at Wikipedia.

Your original code is easy to fix:

int32_t result = (val*((val * 0x535A8) - 0x2675F70));
if (result < 0)
    result += 0xffffff;
result = result>>24;

See the results at http://ideone.com/D0pNPF

You might also just decide that the right shift result is OK as is. The conversion error isn't greater than it is for the other method, just different.

Edit: If you want to do rounding instead of truncation the answer is even easier.

int32_t result = (val*((val * 0x535A8) - 0x2675F70));
result = (result + (1L << 23)) >> 24;

I'm going to join in with some of the others in suggesting that you use a constant expression to replace those magic constants with something that documents how they were derived.

static const int32_t a = (int32_t)(0.02035 * (1L << 24) + 0.5);
static const int32_t b = (int32_t)(2.4038 * (1L << 24) + 0.5);
int32_t result = (val*((val * a) - b));

193

answered Oct 11 '22 23:10

Mark Ransom

How about just scaling your constants by 10000. The maximum number you then get is 2035*120*120 - 24038*120 = 26419440, which is far below the 2^31 limit. So maybe there is no need to do real bit-tweaking here.

As noted by Joe Hass, your problem is that you shift your precision bits into the dustbin.

Whether shifting your decimals by 2 or by 10 to the left does actually not matter. Just pretend your decimal point is not behind the last bit but at the shifted position. If you keep computing with the result, shifting by 2 is likely easier to handle. If you just want to output the result, shift by powers of ten as proposed above, convert the digits and insert the decimal point 5 characters from the right.

answered Oct 12 '22 01:10

Harald

Related questions
                            
                                child and parent process id
                            
                                What can a second year computer undergrad do which might be considered worthwhile in the future? [closed]
                            
                                How to use dlsym reliably when you have duplicated symbols?
                            
                                K&R C Exercise 4-9: Why ignore EOF?
                            
                                sort and remove duplicates from int array in c
                            
                                Sending a struct from kernel to userland via netlink
                            
                                Sorting 2-dimensional array in ANSI C with qsort()
                            
                                Implementing logical right shift using only "~ & ^ | + << >> =" operators and 20 operations
                            
                                C optimzation techniques
                            
                                What is behavior of NULL parameters to strstr?
                            
                                Is there a C library for GUIs that does not require its own event loop to be used?
                            
                                Render to 1D texture
                            
                                Is this a VC compiler bug? About unsigned integer wrapping
                            
                                How to know written var type with Clang using C API instead of actual?
                            
                                How to get function pointer in LLVM
                            
                                Upload file using libcurl + POST and headers
                            
                                Capture video from camera on Raspberry Pi and filter in OpenGL before encoding
                            
                                Ignoring or redefining GCC Standard Predefined Macros
                            
                                fflush - how to check if last operation was output operation
                            
                                parsing JSON with jansson in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rounding up integer without using float, double, or division

Tags:

c

math

rounding

SandBag_1996

People also ask

2 Answers

Mark Ransom

Harald

Recent Activity

Donate For Us