Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rounding integers routine

There is something that baffles me with integer arithmetic in tutorials. To be precise, integer division.

The seemingly preferred method is by casting the divisor into a float, then rounding the float to the nearest whole number, then cast that back into integer:

#include <cmath>

int round_divide_by_float_casting(int a, int b){
    return  (int) std::roundf( a / (float) b);
}

Yet this seems like scratching your left ear with your right hand. I use:

int round_divide (int a, int b){
    return a / b + a % b * 2 / b;
}

It's no breakthrough, but the fact that it is not standard makes me wonder if I am missing anything?

Despite my (albeit limited) testing, I couldn't find any scenario where the two methods give me different results. Did someone run into some sort of scenario where the int → float → int casting produced more accurate results?

like image 566
Adl A Avatar asked Mar 01 '16 07:03

Adl A


People also ask

How do you round integer numbers?

If the digit in the tenths place is less than 5, then round down, which means the units digit remains the same; if the digit in the tenths place is 5 or greater, then round up, which means you should increase the unit digit by one.

What is the 5 rounding rule?

If the first digit to be dropped is a digit greater than 5, or if it is a 5 followed by digits other than zero, the excess digits are all dropped and the last retained digit is increased in value by one unit. Rule 1.

Is 0.5 rounded up or down?

0.5 rounded off to the nearest whole number is 1. Since, the value after decimal is equal to 5, then the number is rounded up to the next whole number. Hence, the whole number of 0.5 will be 1.


2 Answers

Arithmetic solution

If one defined what your functions should return, she would describe it as something close as "f(a, b) returns the closest integer of the result of the division of a by b in the real divisor ring."

Thus, the question can be summarized as: can we define this closest integer using only integer division. I think we can.

There is exactly two candidates as the closest integer: a / b and (a / b) + 1(1). The selection is easy, if a % b is closer to 0 as it is to b, then a / b is our result. If not, (a / b) + 1 is.

One could then write something similar to, ignoring optimization and good practices:

int divide(int a, int b)
{
    const int quot = a / b;
    const int rem = a % b;
    int result;

    if (rem < b - rem) {
        result = quot;
    } else {
        result = quot + 1;
    }
    return result;
}

While this definition satisfies out needs, one could optimize it by not computing two times the division of a by b with the use of std::div():

int divide(int a, int b)
{
    const std::div_t dv = std::div(a, b);
    int result = dv.quot;

    if (dv.rem >= b - dv.rem) {
        ++result;
    }
    return result;
}

The analysis of the problem we did earlier assures us of the well defined behaviour of our implementation.

(1)There is just one last thing to check: how does it behaves when a or b is negative? This is left to the reader ;).

Benchmark

#include <iostream>
#include <iomanip>
#include <string>

// solutions
#include <cmath>
#include <cstdlib>

// benchmak
#include <limits>
#include <random>
#include <chrono>
#include <algorithm>
#include <functional>

//
// Solutions
//
namespace
{
    int round_divide_by_float_casting(int a, int b) {
        return  (int)roundf(a / (float)b);
    }

    int round_divide_by_modulo(int a, int b) {
        return a / b + a % b * 2 / b;
    }

    int divide_by_quotient_comparison(int a, int b)
    {
        const std::div_t dv = std::div(a, b);
        int result = dv.quot;

        if (dv.rem >= b - dv.rem)
        {
            ++result;
        }
        return result;
    }
}

//
// benchmark
//
class Randomizer
{
    std::mt19937 _rng_engine;
    std::uniform_int_distribution<int> _distri;

public:
    Randomizer() : _rng_engine(std::time(0)), _distri(std::numeric_limits<int>::min(), std::numeric_limits<int>::max())
    {
    }

    template<class ForwardIt>
    void operator()(ForwardIt begin, ForwardIt end)
    {
        std::generate(begin, end, std::bind(_distri, _rng_engine));
    }
};

class Clock
{
    std::chrono::time_point<std::chrono::steady_clock> _start;

public:
    static inline std::chrono::time_point<std::chrono::steady_clock> now() { return std::chrono::steady_clock::now(); }

    Clock() : _start(now())
    {
    }

    template<class DurationUnit>
    std::size_t end()
    {
        return std::chrono::duration_cast<DurationUnit>(now() - _start).count();
    }
};

//
// Entry point
//
int main()
{
    Randomizer randomizer;
    std::array<int, 1000> dividends; // SCALE THIS UP (1'000'000 would be great)
    std::array<int, dividends.size()> divisors;
    std::array<int, dividends.size()> results;
    randomizer(std::begin(dividends), std::end(dividends));
    randomizer(std::begin(divisors), std::end(divisors));

    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = round_divide_by_float_casting(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "round_divide_by_float_casting(): " << std::setprecision(3) << unit_time << " ns\n";
    }
    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = round_divide_by_modulo(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "round_divide_by_modulo(): " << std::setprecision(3) << unit_time << " ns\n";
    }
    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = divide_by_quotient_comparison(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "divide_by_quotient_comparison(): " << std::setprecision(3) << unit_time << " ns\n";
    }
}

Outputs:

g++ -std=c++11 -O2 -Wall -Wextra -Werror main.cpp && ./a.out
       round_divide_by_float_casting(): 54.7 ns
              round_divide_by_modulo(): 24 ns
       divide_by_quotient_comparison(): 25.5 ns

Demo

The two arithmetic solutions' performances are not distinguishable (their benchmark converges when you scale the bench size up).

like image 141
YSC Avatar answered Oct 15 '22 10:10

YSC


It would really depend on the processor, and the range of the integer which is better (and using double would resolve most of the range issues)

For modern "big" CPUs like x86-64 and ARM, integer division and floating point division are roughly the same effort, and converting an integer to a float or vice versa is not a "hard" task (and does the correct rounding directly in that conversion, at least), so most likely the resulting operations are.

atmp = (float) a;
btmp = (float) b;
resfloat = divide atmp/btmp;
return = to_int_with_rounding(resfloat)

About four machine instructions.

On the other hand, your code uses two divides, one modulo and a multiply, which is quite likely longer on such a processor.

tmp = a/b;
tmp1 = a % b;
tmp2 = tmp1 * 2;
tmp3 = tmp2 / b;
tmp4 = tmp + tmp3;

So five instructions, and three of those are "divide" (unless the compiler is clever enough to reuse a / b for a % b - but it's still two distinct divides).

Of course, if you are outside the range of number of digits that a float or double can hold without losing digits (23 bits for float, 53 bits for double), then your method MAY be better (assuming there is no overflow in the integer math).

On top of all that, since the first form is used by "everyone", it's the one that the compiler recognises and can optimise.

Obviously, the results depend on both the compiler being used and the processor it runs on, but these are my results from running the code posted above, compiled through clang++ (v3.9-pre-release, pretty close to released 3.8).

   round_divide_by_float_casting(): 32.5 ns
          round_divide_by_modulo(): 113 ns
   divide_by_quotient_comparison(): 80.4 ns

However, the interesting thing I find when I look at the generated code:

xorps   %xmm0, %xmm0
cvtsi2ssl   8016(%rsp,%rbp), %xmm0
xorps   %xmm1, %xmm1
cvtsi2ssl   4016(%rsp,%rbp), %xmm1
divss   %xmm1, %xmm0
callq   roundf
cvttss2si   %xmm0, %eax
movl    %eax, 16(%rsp,%rbp)
addq    $4, %rbp
cmpq    $4000, %rbp             # imm = 0xFA0
jne .LBB0_7

is that the round is actually a call. Which really surprises me, but explains why on some machines (particularly more recent x86 processors), it is faster.

g++ gives better results with -ffast-math, which gives around:

  round_divide_by_float_casting(): 17.6 ns
          round_divide_by_modulo(): 43.1 ns
   divide_by_quotient_comparison(): 18.5 ns

(This is with increased count to 100k values)

like image 33
Mats Petersson Avatar answered Oct 15 '22 09:10

Mats Petersson