I have a 32 bit floating point <code>f</code> number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value <code>m</code> as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if <code>f <= m</code> mathematically. Note that the types of the two values don't match. The following three approaches each have their issues: <ul> <li> <code>static_cast<uint32_t>(f) <= m</code>: I think this triggers undefined behaviour if <code>f</code> doesn't fit the 32 bit integer</li> <li> <code>f <= static_cast<float>(m)</code>: if <code>m</code> is too large to be converted exactly, the converted value could be larger than <code>m</code> such that the subsequent comparison will produce the wrong result in certain edge cases</li> <li> <code>static_cast<double>(f) <= static_cast<double>(m)</code>: is mathematically correct, but requires casting to, and working with double, which I'd like to avoid for efficiency reasons</li> </ul> Surely there must be a way to convert an integer to a float directly with specified rounding direction, i.e. guaranteeing the result not to exceed the input in magnitude. I'd prefer a C++11 standard solution, but in the worst case platform intrinsics could qualify as well.

I think your best bet is to be a bit platform specific. 2³² can be represented precisely in floating point. Check if <code>f</code> is too large to fit at all, and then convert to unsigned and check against <code>m</code>. <pre class="prettyprint"><code>const float unsigned_limit = 4294967296.0f; bool ok = false; if (f < unsigned_limit) { const auto uf = static_cast<unsigned int>(f); if (uf <= m) { ok = true; } } </code></pre> Not fond of the double comparison, but it's clear. If <code>f</code> is usually significantly less than <code>m</code> (or usually significantly greater), one can test against <code>float(m)*0.99f</code> (respectively <code>float(m)*1.01f</code>), and then do the exact comparison in the unusual case. That is probably only worth doing if profiling shows that the performance gain is worth the extra complexity.

Compare a 32 bit float and a 32 bit integer without casting to double, when either value could be too large to fit the other type exactly

Tags:

c++

floating-point

precision

arm

I have a 32 bit floating point f number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value m as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if f <= m mathematically. Note that the types of the two values don't match. The following three approaches each have their issues:

static_cast<uint32_t>(f) <= m: I think this triggers undefined behaviour if f doesn't fit the 32 bit integer
f <= static_cast<float>(m): if m is too large to be converted exactly, the converted value could be larger than m such that the subsequent comparison will produce the wrong result in certain edge cases
static_cast<double>(f) <= static_cast<double>(m): is mathematically correct, but requires casting to, and working with double, which I'd like to avoid for efficiency reasons

Surely there must be a way to convert an integer to a float directly with specified rounding direction, i.e. guaranteeing the result not to exceed the input in magnitude. I'd prefer a C++11 standard solution, but in the worst case platform intrinsics could qualify as well.

598

asked May 09 '17 06:05

burnpanck

1 Answers

I think your best bet is to be a bit platform specific. 2³² can be represented precisely in floating point. Check if f is too large to fit at all, and then convert to unsigned and check against m.

const float unsigned_limit = 4294967296.0f;
bool ok = false;
if (f < unsigned_limit)
{
    const auto uf = static_cast<unsigned int>(f);
    if (uf <= m)
    {
        ok = true;
    }
}

Not fond of the double comparison, but it's clear.

If f is usually significantly less than m (or usually significantly greater), one can test against float(m)*0.99f (respectively float(m)*1.01f), and then do the exact comparison in the unusual case. That is probably only worth doing if profiling shows that the performance gain is worth the extra complexity.

116

answered Oct 05 '22 14:10

Martin Bonner supports Monica

Related questions
                            
                                CUDA with visual studio and cmake
                            
                                C++ Tensorflow, how to make session->Run() with multithread, or spend less time
                            
                                What's happening in this return statement?
                            
                                Cygwin installation note:Hand installation over to elevated child process
                            
                                Moving a shared_ptr to the method called on the object the shared_ptr points to
                            
                                What are some uses of local iterator for STL unordered containers?
                            
                                What is the macro CV_OCL_RUN used for in OpenCV?
                            
                                Clion undefined reference to function [duplicate]
                            
                                Can a template parameter be both an int and an unsigned long?
                            
                                Can expression using pointers causing unspecified (not undefined!) behaviour be used in constexpr context?
                            
                                .obj Parser + Render GLUT
                            
                                lag in opencv videocapture when i use rtsp camera stream
                            
                                Work around incomplete type in static assert
                            
                                Writing assembly in C++ without using other variables
                            
                                Using lock_guard in loop
                            
                                Calling base class template constructor in C++
                            
                                Resample curve into even length segments using C++
                            
                                How to import Tensorflow source codes correctly with Clion or Netbeans
                            
                                Force template static member instantiation
                            
                                Execute output of cmake target as dependency for another

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With