There's an old trick to do this with only one comparison/branch. Whether it'll really improve speed may be open to question, and even if it does, it's probably too little to notice or care about, but when you're only starting with two comparisons, the chances of a huge improvement are pretty remote. The code looks like:
// use a < for an inclusive lower bound and exclusive upper bound
// use <= for an inclusive lower bound and inclusive upper bound
// alternatively, if the upper bound is inclusive and you can pre-calculate
// upper-lower, simply add + 1 to upper-lower and use the < operator.
if ((unsigned)(number-lower) <= (upper-lower))
in_range(number);
With a typical, modern computer (i.e., anything using twos complement), the conversion to unsigned is really a nop -- just a change in how the same bits are viewed.
Note that in a typical case, you can pre-compute upper-lower
outside a (presumed) loop, so that doesn't normally contribute any significant time. Along with reducing the number of branch instructions, this also (generally) improves branch prediction. In this case, the same branch is taken whether the number is below the bottom end or above the top end of the range.
As to how this works, the basic idea is pretty simple: a negative number, when viewed as an unsigned number, will be larger than anything that started out as a positive number.
In practice this method translates number
and the interval to the point of origin and checks if number
is in the interval [0, D]
, where D = upper - lower
. If number
below lower bound: negative, and if above upper bound: larger than D
.
It's rare to be able to do significant optimizations to code on such a small scale. Big performance gains come from observing and modifying the code from a higher level. You may be able to eliminate the need for the range test altogether, or only do O(n) of them instead of O(n^2). You may be able to re-order the tests so that one side of the inequality is always implied. Even if the algorithm is ideal, gains are more likely to come when you see how this code does the range test 10 million times and you find a way to batch them up and use SSE to do many tests in parallel.
It depends on how many times you want to perform the test over the same data.
If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm.
If you are doing this for a very finite set of values, then you could create a lookup table. Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up.
For your data the lookup table would be 128^3 = 2,097,152. If you can control one of the three variables so you consider all instances where start = N
at one time, then the size of the working set drops down to 128^2 = 16432
bytes, which should fit well in most modern caches.
You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons.
This answer is to report on a testing done with the accepted answer. I performed a closed range test on a large vector of sorted random integer and to my surprise the basic method of ( low <= num && num <= high) is in fact faster than the accepted answer above! Test was done on HP Pavilion g6 (AMD A6-3400APU with 6GB ram. Here's the core code used for testing:
int num = rand(); // num to compare in consecutive ranges.
chrono::time_point<chrono::system_clock> start, end;
auto start = chrono::system_clock::now();
int inBetween1{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
if (randVec[i - 1] <= num && num <= randVec[i])
++inBetween1;
}
auto end = chrono::system_clock::now();
chrono::duration<double> elapsed_s1 = end - start;
compared with the following which is the accepted answer above:
int inBetween2{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
if (static_cast<unsigned>(num - randVec[i - 1]) <= (randVec[i] - randVec[i - 1]))
++inBetween2;
}
Pay attention that randVec is a sorted vector. For any size of MaxNum the first method beats the second one on my machine!
For any variable range checking:
if (x >= minx && x <= maxx) ...
It is faster to use bit operation:
if ( ((x - minx) | (maxx - x)) >= 0) ...
This will reduce two branches into one.
If you care about type safe:
if ((int32_t)(((uint32_t)x - (uint32_t)minx) | ((uint32_t)maxx - (uint32_t)x)) > = 0) ...
You can combine more variable range checking together:
if (( (x - minx) | (maxx - x) | (y - miny) | (maxy - y) ) >= 0) ...
This will reduce 4 branches into 1.
It is 3.4 times faster than the old one in gcc:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With