I need to divide two numbers and round it up. Are there any better way to do this?
int myValue = (int) ceil( (float)myIntNumber / myOtherInt );
I find an overkill to have to cast two different time. (the extern int cast is just to shut down the warning)
Note I have to cast internally to float otherwise
int a = ceil(256/11); //> Should be 24, but it is 23
^example
If the divisor and dividend have the same sign then the result is zero or positive. If the divisor and dividend have opposite signs then the result is zero or negative. If the division is inexact then the quotient is rounded up.
Dividing an integer by an integer gives an integer result. 1/2 yields 0; assigning this result to a floating-point variable gives 0.0. To get a floating-point result, at least one of the operands must be a floating-point type. b = a / 350.0f; should give you the result you want.
The division operator / means integer division if there is an integer on both sides of it. If one or two sides has a floating point number, then it means floating point division.
Adding subtracting or multiplying two ints always yields an int result, but division is different. The result of division is always a float value, even if the division comes out even.
Assuming that both myIntNumber
and myOtherInt
are positive, you could do:
int myValue = (myIntNumber + myOtherInt - 1) / myOtherInt;
Since a lot of different methods are shown in the answers and none of the answers actually prove any advantages in terms of performance I tried to benchmark them myself. My plan was to write an answer that contains a short table and a definite answer which method is the fastest.
Unfortunately it wasn't that simple. (It never is.) It seems that the performance of the rounding formulas depend on the used data type, compiler and optimization level.
In one case there is an increase of speed by 7.5× from one method to another. So the impact can be significant for some people.
For long
integers the naive version using a type cast to float
and std::ceil
was actually the fastest. This was interesting for me personally since I intended to use it with size_t
which is usually defined as unsigned long
.
For ordinary int
s it depends on your optimization level. For lower levels @Jwodder's solution performs best. For the highest levels std::ceil
was the optimal one. With one exception: For the clang/unsigned int
combination @Jwodder's was still better.
The solutions from the accepted answer never really outperformed the other two. You should keep in mind however that @Jwodder's solution doesn't work with negatives.
Results are at the bottom.
To recap here are the four methods I benchmarked and compared:
template<typename T>
inline T divCeilJwodder(const T& numerator, const T& denominator)
{
return (numerator + denominator - 1) / denominator;
}
template<typename T>
inline T divCeilVoigtModulo(const T& numerator, const T& denominator)
{
return numerator / denominator + (((numerator < 0) ^ (denominator > 0))
&& (numerator%denominator));
}
inline T divCeilVoigtNoModulo(const T& numerator, const T& denominator)
{
T truncated = numerator / denominator;
return truncated + (((numerator < 0) ^ (denominator > 0))
&& (numerator - truncated*denominator));
}
template<typename T>
inline T divCeilTypeCast(const T& numerator, const T& denominator)
{
return (int)std::ceil((double)numerator / denominator);
}
In a single batch the division is performed 100 million times. Ten batches are calculated for each combination of Compiler/Optimization level, used data type and used implementation. The values shown below are the averages of all 10 batches in milliseconds. The errors that are given are standard deviations.
The whole source code that was used can be found here. Also you might find this script useful which compiles and executes the source with different compiler flags.
The whole benchmark was performed on a i7-7700K. The used compiler versions were GCC 10.2.0 and clang 11.0.1.
Now without further ado here are the results:
DataType Algorithm |
GCC -O0 |
-O1 | -O2 | -O3 | -Os | -Ofast | -Og | clang -O0 |
-O1 | -O2 | -O3 | -Ofast | -Os | -Oz |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
unsigned |
||||||||||||||
Jwodder | 264.1 ± 0.9 🏆 | 175.2 ± 0.9 🏆 | 153.5 ± 0.7 🏆 | 175.2 ± 0.5 🏆 | 153.3 ± 0.5 | 153.4 ± 0.8 | 175.5 ± 0.6 🏆 | 329.4 ± 1.3 🏆 | 220.0 ± 1.3 🏆 | 146.2 ± 0.6 🏆 | 146.2 ± 0.6 🏆 | 146.0 ± 0.5 🏆 | 153.2 ± 0.3 🏆 | 153.5 ± 0.6 🏆 |
VoigtModulo | 528.5 ± 2.5 | 306.5 ± 1.0 | 175.8 ± 0.7 | 175.2 ± 0.5 🏆 | 175.6 ± 0.7 | 175.4 ± 0.6 | 352.0 ± 1.0 | 588.9 ± 6.4 | 408.7 ± 1.5 | 164.8 ± 1.0 | 164.0 ± 0.4 | 164.1 ± 0.4 | 175.2 ± 0.5 | 175.8 ± 0.9 |
VoigtNoModulo | 375.3 ± 1.5 | 175.7 ± 1.3 🏆 | 192.5 ± 1.4 | 197.6 ± 1.9 | 200.6 ± 7.2 | 176.1 ± 1.5 | 197.9 ± 0.5 | 541.0 ± 1.8 | 263.1 ± 0.9 | 186.4 ± 0.6 | 186.4 ± 1.2 | 187.2 ± 1.1 | 197.2 ± 0.8 | 197.1 ± 0.7 |
TypeCast | 348.5 ± 2.7 | 231.9 ± 3.9 | 234.4 ± 1.3 | 226.6 ± 1.0 | 137.5 ± 0.8 🏆 | 138.7 ± 1.7 🏆 | 243.8 ± 1.4 | 591.2 ± 2.4 | 591.3 ± 2.6 | 155.8 ± 1.9 | 155.9 ± 1.6 | 155.9 ± 2.4 | 214.6 ± 1.9 | 213.6 ± 1.1 |
unsigned long |
||||||||||||||
Jwodder | 658.6 ± 2.5 | 546.3 ± 0.9 | 549.3 ± 1.8 | 549.1 ± 2.8 | 540.6 ± 3.4 | 548.8 ± 1.3 | 486.1 ± 1.1 | 638.1 ± 1.8 | 613.3 ± 2.1 | 190.0 ± 0.8 🏆 | 182.7 ± 0.5 | 182.4 ± 0.5 | 496.2 ± 1.3 | 554.1 ± 1.0 |
VoigtModulo | 1,169.0 ± 2.9 | 1,015.9 ± 4.4 | 550.8 ± 2.0 | 504.0 ± 1.4 | 550.3 ± 1.2 | 550.5 ± 1.3 | 1,020.1 ± 2.9 | 1,259.0 ± 9.0 | 1,136.5 ± 4.2 | 187.0 ± 3.4 🏆 | 199.7 ± 6.1 | 197.6 ± 1.0 | 549.4 ± 1.7 | 506.8 ± 4.4 |
VoigtNoModulo | 768.1 ± 1.7 | 559.1 ± 1.8 | 534.4 ± 1.6 | 533.7 ± 1.5 | 559.5 ± 1.7 | 534.3 ± 1.5 | 571.5 ± 1.3 | 879.5 ± 10.8 | 617.8 ± 2.1 | 223.4 ± 1.3 | 231.3 ± 1.3 | 231.4 ± 1.1 | 594.6 ± 1.9 | 572.2 ± 0.8 |
TypeCast | 353.3 ± 2.5 🏆 | 267.5 ± 1.7 🏆 | 248.0 ± 1.6 🏆 | 243.8 ± 1.2 🏆 | 154.2 ± 0.8 🏆 | 154.1 ± 1.0 🏆 | 263.8 ± 1.8 🏆 | 365.5 ± 1.6 🏆 | 316.9 ± 1.8 🏆 | 189.7 ± 2.1 🏆 | 156.3 ± 1.8 🏆 | 157.0 ± 2.2 🏆 | 155.1 ± 0.9 🏆 | 176.5 ± 1.2 🏆 |
int |
||||||||||||||
Jwodder | 307.9 ± 1.3 🏆 | 175.4 ± 0.9 🏆 | 175.3 ± 0.5 🏆 | 175.4 ± 0.6 🏆 | 175.2 ± 0.5 | 175.1 ± 0.6 | 175.1 ± 0.5 🏆 | 307.4 ± 1.2 🏆 | 219.6 ± 0.6 🏆 | 146.0 ± 0.3 🏆 | 153.5 ± 0.5 | 153.6 ± 0.8 | 175.4 ± 0.7 🏆 | 175.2 ± 0.5 🏆 |
VoigtModulo | 528.5 ± 1.9 | 351.9 ± 4.6 | 175.3 ± 0.6 🏆 | 175.2 ± 0.4 🏆 | 197.1 ± 0.6 | 175.2 ± 0.8 | 373.5 ± 1.1 | 598.7 ± 5.1 | 460.6 ± 1.3 | 175.4 ± 0.4 | 164.3 ± 0.9 | 164.0 ± 0.4 | 176.3 ± 1.6 🏆 | 460.0 ± 0.8 |
VoigtNoModulo | 398.0 ± 2.5 | 241.0 ± 0.7 | 199.4 ± 5.1 | 219.2 ± 1.0 | 175.9 ± 1.2 | 197.7 ± 1.2 | 242.9 ± 3.0 | 543.5 ± 2.3 | 350.6 ± 1.0 | 186.6 ± 1.2 | 185.7 ± 0.3 | 186.3 ± 1.1 | 197.1 ± 0.6 | 373.3 ± 1.6 |
TypeCast | 338.8 ± 4.9 | 228.1 ± 0.9 | 230.3 ± 0.8 | 229.5 ± 9.4 | 153.8 ± 0.4 🏆 | 138.3 ± 1.0 🏆 | 241.1 ± 1.1 | 590.0 ± 2.1 | 589.9 ± 0.8 | 155.2 ± 2.4 | 149.4 ± 1.6 🏆 | 148.4 ± 1.2 🏆 | 214.6 ± 2.2 | 211.7 ± 1.6 |
long |
||||||||||||||
Jwodder | 758.1 ± 1.8 | 600.6 ± 0.9 | 601.5 ± 2.2 | 601.5 ± 2.8 | 581.2 ± 1.9 | 600.6 ± 1.8 | 586.3 ± 3.6 | 745.9 ± 3.6 | 685.8 ± 2.2 | 183.1 ± 1.0 | 182.5 ± 0.5 | 182.6 ± 0.6 | 553.2 ± 1.5 | 488.0 ± 0.8 |
VoigtModulo | 1,360.8 ± 6.1 | 1,202.0 ± 2.1 | 600.0 ± 2.4 | 600.0 ± 3.0 | 607.0 ± 6.8 | 599.0 ± 1.6 | 1,187.2 ± 2.6 | 1,439.6 ± 6.7 | 1,346.5 ± 2.9 | 197.9 ± 0.7 | 208.2 ± 0.6 | 208.0 ± 0.4 | 548.9 ± 1.4 | 1,326.4 ± 3.0 |
VoigtNoModulo | 844.5 ± 6.9 | 647.3 ± 1.3 | 628.9 ± 1.8 | 627.9 ± 1.6 | 629.1 ± 2.4 | 629.6 ± 4.4 | 668.2 ± 2.7 | 1,019.5 ± 3.2 | 715.1 ± 8.2 | 224.3 ± 4.8 | 219.0 ± 1.0 | 219.0 ± 0.6 | 561.7 ± 2.5 | 769.4 ± 9.3 |
TypeCast | 366.1 ± 0.8 🏆 | 246.2 ± 1.1 🏆 | 245.3 ± 1.8 🏆 | 244.6 ± 1.1 🏆 | 154.6 ± 1.6 🏆 | 154.3 ± 0.5 🏆 | 257.4 ± 1.5 🏆 | 591.8 ± 4.1 🏆 | 590.4 ± 1.3 🏆 | 154.5 ± 1.3 🏆 | 135.4 ± 8.3 🏆 | 132.9 ± 0.7 🏆 | 132.8 ± 1.2 🏆 | 177.4 ± 2.3 🏆 |
Now I can finally get on with my life :P
With help from DyP, came up with the following branchless formula:
int idiv_ceil ( int numerator, int denominator )
{
return numerator / denominator
+ (((numerator < 0) ^ (denominator > 0)) && (numerator%denominator));
}
It avoids floating-point conversions and passes a basic suite of unit tests, as shown here:
Here's another version that avoids the modulo operator.
int idiv_ceil ( int numerator, int denominator )
{
int truncated = numerator / denominator;
return truncated + (((numerator < 0) ^ (denominator > 0)) &&
(numerator - truncated*denominator));
}
The first one will be faster on processors where IDIV returns both quotient and remainder (and the compiler is smart enough to use that).
Maybe it is just easier to do a:
int result = dividend / divisor;
if(dividend % divisor != 0)
result++;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With