Suppose we have 2 constants <code>A</code> & <code>B</code> and a variable <code>i</code>, all 64 bits integers. And we want to compute a simple common arithmetic operation such as: <pre class="prettyprint"><code>i * A / B (1) </code></pre> To simplify the problem, let's assume that variable <code>i</code> is always in the range <code>[INT64_MIN*B/A, INT64_MAX*B/A]</code>, so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range <code>[INT64_MIN, INT64_MAX]</code>). In addition, <code>i</code> is assumed to be more likely in the friendly range Range1 = <code>[INT64_MIN/A, INT64_MAX/A]</code> (i.e.: close to 0), however <code>i</code> may be (less likely) outside this range. In the first case, a trivial integer computation of <code>i * A</code> would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of <code>i * A</code> would overflow, leading to an erroneous result in computation of (1). What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided <code>i</code> is more likely in the friendly range Range1. At now, the solution currently implemented in the code is the following one : <pre class="prettyprint"><code>(int64_t)((double)A / B * i) </code></pre> which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division <code>(double)A / B</code> is precomputed at compile time, letting only a double multiplication to be computed at runtime.

If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use <code>__int128</code>. The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated. All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the <code>__mul128</code> intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using <code>__int128</code> directly. At least that way the compiler implementers have already done all the hard work for you. If your application domain can give you useful bounds, like that <code>(u % d) * v</code> will not overflow then you can use the identity <pre class="prettyprint"><code>(u * v) / d = (u / d) * v + ((u % d) * v) / d </code></pre> where <code>/</code> signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator <code>%</code>). In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined. Also, with unsigned types you can fall back on rules like that with <code>s = a (+) b</code> (where <code>(+)</code> is possibly-overflowing unsigned addition) you will have either <code>s == a + b</code> or <code>s < a && s < b</code>, which lets you detect overflow after the fact with cheap operations. However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.

Safest and most efficient way to compute an integer operation that may overflow

Tags:

Suppose we have 2 constants A & B and a variable i, all 64 bits integers. And we want to compute a simple common arithmetic operation such as:

i * A / B    (1)

To simplify the problem, let's assume that variable i is always in the range [INT64_MIN*B/A, INT64_MAX*B/A], so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range [INT64_MIN, INT64_MAX]).

In addition, i is assumed to be more likely in the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A] (i.e.: close to 0), however i may be (less likely) outside this range. In the first case, a trivial integer computation of i * A would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of i * A would overflow, leading to an erroneous result in computation of (1).

What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided i is more likely in the friendly range Range1.

At now, the solution currently implemented in the code is the following one :

(int64_t)((double)A / B * i)

which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division (double)A / B is precomputed at compile time, letting only a double multiplication to be computed at runtime.

419

asked Apr 24 '16 17:04

shrike

1 Answers

If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use __int128.

The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated.

All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the __mul128 intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using __int128 directly. At least that way the compiler implementers have already done all the hard work for you.

If your application domain can give you useful bounds, like that (u % d) * v will not overflow then you can use the identity

(u * v) / d = (u / d) * v + ((u % d) * v) / d

where / signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator %).

In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined.

Also, with unsigned types you can fall back on rules like that with s = a (+) b (where (+) is possibly-overflowing unsigned addition) you will have either s == a + b or s < a && s < b, which lets you detect overflow after the fact with cheap operations.

However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.

answered Oct 24 '22 04:10

DarthGizka

Related questions
                            
                                How can I make my Docker compose "wait-for-it" script invoke the original container ENTRYPOINT or CMD?
                            
                                Correct alternative to SharedSessionContract.createCriteria(Class persistentClass) in Hibernate 5.2
                            
                                Why there is no function-try-block for lambda?
                            
                                Why don't property initializers call a custom setter?
                            
                                Cant find package from $GOROOT and $GOPATH
                            
                                Overload resolution for multiply inherited operator()
                            
                                Sharing dynamically loaded classes with JShell instance
                            
                                Using Virtual Scroll in Angular Material 2 Table with @angular/cdk-experimental
                            
                                How do you use transactions in the Clean Architecture?
                            
                                Error. Your card doesn't support automatic recurring payments-Google cloud [closed]
                            
                                3D scatterplots in Python with hue colormap and legend
                            
                                SwiftUI macOS fetch json error [logging] volume does not support data protection, stripping SQLITE_OPEN_FILEPROTECTION_* flags\

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With