Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the effect of ordering if...else if statements by probability?

People also ask

Why the order of the IF-else else if statements matter?

Remember that if/else-if/else executes in order and will stop when it finds an acceptable case. In your example, if a=5, b=5, c=5 , you'll return c when you should be returning 0. This is because a==b and a==b && b==c are not mutually exclusive - they can both be true, so the order matters.

What happens if two if statements are true?

Use two if statements if both if statement conditions could be true at the same time. In this example, both conditions can be true. You can pass and do great at the same time. Use an if/else statement if the two conditions are mutually exclusive meaning if one condition is true the other condition must be false.

What are the advantages of if-else statement?

Advantages: if-else statement helps us to make decision in programming and execute the right code. It also helps in debugging of code.

What is the purpose of using if/then else logic when programming?

The IF-THEN-ELSE statement enables you to execute different Logic statements based on the value of an input, for example, to control the value of an item based on the value of an input item.


As a general rule, most if not all Intel CPUs assume forward branches are not taken the first time they see them. See Godbolt's work.

After that, the branch goes into a branch prediction cache, and past behavior is used to inform future branch prediction.

So in a tight loop, the effect of misordering is going to be relatively small. The branch predictor is going to learn which set of branches is most likely, and if you have non-trivial amount of work in the loop the small differences won't add up much.

In general code, most compilers by default (lacking another reason) will order the produced machine code roughly the way you ordered it in your code. Thus if statements are forward branches when they fail.

So you should order your branches in the order of decreasing likelihood to get the best branch prediction from a "first encounter".

A microbenchmark that loops tightly many times over a set of conditions and does trivial work is going to dominated by tiny effects of instruction count and the like, and little in the way of relative branch prediction issues. So in this case you must profile, as rules of thumb won't be reliable.

On top of that, vectorization and many other optimizations apply to tiny tight loops.

So in general code, put most likely code within the if block, and that will result in the fewest un-cached branch prediction misses. In tight loops, follow the general rule to start, and if you need to know more you have little choice but to profile.

Naturally this all goes out the window if some tests are far cheaper than others.


I made up the following test to time the execution of two different if...else if blocks, one sorted in order of probability, the other sorted in reverse order:

#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
#include <iterator>
#include <functional>

using namespace std;

int main()
{
    long long sortedTime = 0;
    long long reverseTime = 0;

    for (int n = 0; n != 500; ++n)
    {
        //Generate a vector of 5000 random integers from 1 to 100
        random_device rnd_device;
        mt19937 rnd_engine(rnd_device());
        uniform_int_distribution<int> rnd_dist(1, 100);
        auto gen = std::bind(rnd_dist, rnd_engine);
        vector<int> rand_vec(5000);
        generate(begin(rand_vec), end(rand_vec), gen);

        volatile int nLow, nMid, nHigh;
        chrono::time_point<chrono::high_resolution_clock> start, end;

        //Sort the conditional statements in order of increasing likelyhood
        nLow = nMid = nHigh = 0;
        start = chrono::high_resolution_clock::now();
        for (int& i : rand_vec) {
            if (i >= 95) ++nHigh;               //Least likely branch
            else if (i < 20) ++nLow;
            else if (i >= 20 && i < 95) ++nMid; //Most likely branch
        }
        end = chrono::high_resolution_clock::now();
        reverseTime += chrono::duration_cast<chrono::nanoseconds>(end-start).count();

        //Sort the conditional statements in order of decreasing likelyhood
        nLow = nMid = nHigh = 0;
        start = chrono::high_resolution_clock::now();
        for (int& i : rand_vec) {
            if (i >= 20 && i < 95) ++nMid;  //Most likely branch
            else if (i < 20) ++nLow;
            else if (i >= 95) ++nHigh;      //Least likely branch
        }
        end = chrono::high_resolution_clock::now();
        sortedTime += chrono::duration_cast<chrono::nanoseconds>(end-start).count();

    }

    cout << "Percentage difference: " << 100 * (double(reverseTime) - double(sortedTime)) / double(sortedTime) << endl << endl;
}

Using MSVC2017 with /O2, the results show that the sorted version is consistently about 28% faster than the unsorted version. Per luk32's comment, I also switched the order of the two tests, which makes a noticeable difference (22% vs 28%). The code was run under Windows 7 on an Intel Xeon E5-2697 v2. This is, of course, very problem-specific and should not be interpreted as a conclusive answer.


No you should not, unless you are really sure that target system is affected. By default go by readability.

I highly doubt your results. I've modified your example a bit, so reversing execution is easier. Ideone rather consistently shows that reverse-order is faster, though not much. On certain runs even this occasionally flipped. I'd say the results are inconclusive. coliru reports no real difference as well. I can check Exynos5422 CPU on my odroid xu4 later on.

The thing is that modern CPUs have branch predictors. There is much-much logic dedicated to pre-fetching both data and instructions, and modern x86 CPUs are rather smart, when it comes to this. Some slimmer architectures like ARMs or GPUs might be vulnerable to this. But it is really highly dependent on both compiler and target system.

I would say that branch ordering optimization is pretty fragile and ephemeral. Do it only as some really fine-tuning step.

Code:

#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
#include <iterator>
#include <functional>

using namespace std;

int main()
{
    //Generate a vector of random integers from 1 to 100
    random_device rnd_device;
    mt19937 rnd_engine(rnd_device());
    uniform_int_distribution<int> rnd_dist(1, 100);
    auto gen = std::bind(rnd_dist, rnd_engine);
    vector<int> rand_vec(5000);
    generate(begin(rand_vec), end(rand_vec), gen);
    volatile int nLow, nMid, nHigh;

    //Count the number of values in each of three different ranges
    //Run the test a few times
    for (int n = 0; n != 10; ++n) {

        //Run the test again, but now sort the conditional statements in reverse-order of likelyhood
        {
          nLow = nMid = nHigh = 0;
          auto start = chrono::high_resolution_clock::now();
          for (int& i : rand_vec) {
              if (i >= 95) ++nHigh;               //Least likely branch
              else if (i < 20) ++nLow;
              else if (i >= 20 && i < 95) ++nMid; //Most likely branch
          }
          auto end = chrono::high_resolution_clock::now();
          cout << "Reverse-sorted: \t" << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << "ns" << endl;
        }

        {
          //Sort the conditional statements in order of likelyhood
          nLow = nMid = nHigh = 0;
          auto start = chrono::high_resolution_clock::now();
          for (int& i : rand_vec) {
              if (i >= 20 && i < 95) ++nMid;  //Most likely branch
              else if (i < 20) ++nLow;
              else if (i >= 95) ++nHigh;      //Least likely branch
          }
          auto end = chrono::high_resolution_clock::now();
          cout << "Sorted:\t\t\t" << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << "ns" << endl;
        }
        cout << endl;
    }
}

Just my 5 cents. It seems the effect of ordering if statements should depend on:

  1. Probability of each if statement.

  2. Number of iterations, so the branch predictor could kick in.

  3. Likely/unlikely compiler hints, i.e. code layout.

To explore those factors, I benchmarked the following functions:

ordered_ifs()

for (i = 0; i < data_sz * 1024; i++) {
    if (data[i] < check_point) // highly likely
        s += 3;
    else if (data[i] > check_point) // samewhat likely
        s += 2;
    else if (data[i] == check_point) // very unlikely
        s += 1;
}

reversed_ifs()

for (i = 0; i < data_sz * 1024; i++) {
    if (data[i] == check_point) // very unlikely
        s += 1;
    else if (data[i] > check_point) // samewhat likely
        s += 2;
    else if (data[i] < check_point) // highly likely
        s += 3;
}

ordered_ifs_with_hints()

for (i = 0; i < data_sz * 1024; i++) {
    if (likely(data[i] < check_point)) // highly likely
        s += 3;
    else if (data[i] > check_point) // samewhat likely
        s += 2;
    else if (unlikely(data[i] == check_point)) // very unlikely
        s += 1;
}

reversed_ifs_with_hints()

for (i = 0; i < data_sz * 1024; i++) {
    if (unlikely(data[i] == check_point)) // very unlikely
        s += 1;
    else if (data[i] > check_point) // samewhat likely
        s += 2;
    else if (likely(data[i] < check_point)) // highly likely
        s += 3;
}

data

The data array contains random numbers between 0 and 100:

const int RANGE_MAX = 100;
uint8_t data[DATA_MAX * 1024];

static void data_init(int data_sz)
{
    int i;
        srand(0);
    for (i = 0; i < data_sz * 1024; i++)
        data[i] = rand() % RANGE_MAX;
}

The Results

The following results are for Intel i5@3,2 GHz and G++ 6.3.0. The first argument is the check_point (i.e. probability in %% for the highly likely if statement), the second argument is data_sz (i.e. number of iterations).

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4                    4660 ns       4658 ns     150948
ordered_ifs/50/8                   25636 ns      25635 ns      27852
ordered_ifs/75/4                    4326 ns       4325 ns     162613
ordered_ifs/75/8                   18242 ns      18242 ns      37931
ordered_ifs/100/4                   1673 ns       1673 ns     417073
ordered_ifs/100/8                   3381 ns       3381 ns     207612
reversed_ifs/50/4                   5342 ns       5341 ns     126800
reversed_ifs/50/8                  26050 ns      26050 ns      26894
reversed_ifs/75/4                   3616 ns       3616 ns     193130
reversed_ifs/75/8                  15697 ns      15696 ns      44618
reversed_ifs/100/4                  3738 ns       3738 ns     188087
reversed_ifs/100/8                  7476 ns       7476 ns      93752
ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
ordered_ifs_with_hints/75/4         3165 ns       3165 ns     218492
ordered_ifs_with_hints/75/8        13785 ns      13785 ns      50574
ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687
ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205
reversed_ifs_with_hints/50/4        6573 ns       6572 ns     105629
reversed_ifs_with_hints/50/8       27351 ns      27351 ns      25568
reversed_ifs_with_hints/75/4        3537 ns       3537 ns     197470
reversed_ifs_with_hints/75/8       16130 ns      16130 ns      43279
reversed_ifs_with_hints/100/4       3737 ns       3737 ns     187583
reversed_ifs_with_hints/100/8       7446 ns       7446 ns      93782

Analysis

1. The Ordering Does Matter

For 4K iterations and (almost) 100% probability of highly liked statement the difference is huge 223%:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/100/4                   1673 ns       1673 ns     417073
reversed_ifs/100/4                  3738 ns       3738 ns     188087

For 4K iterations and 50% probability of highly liked statement the difference is about 14%:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4                    4660 ns       4658 ns     150948
reversed_ifs/50/4                   5342 ns       5341 ns     126800

2. Number of Iterations Does Matter

The difference between 4K and 8K iterations for (almost) 100% probability of highly liked statement is about two times (as expected):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/100/4                   1673 ns       1673 ns     417073
ordered_ifs/100/8                   3381 ns       3381 ns     207612

But the difference between 4K and 8K iterations for 50% probability of highly liked statement is 5,5 times:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4                    4660 ns       4658 ns     150948
ordered_ifs/50/8                   25636 ns      25635 ns      27852

Why is so? Because of branch predictor misses. Here is the branch misses for each mentioned above case:

ordered_ifs/100/4    0.01% of branch-misses
ordered_ifs/100/8    0.01% of branch-misses
ordered_ifs/50/4     3.18% of branch-misses
ordered_ifs/50/8     15.22% of branch-misses

So on my i5 the branch predictor fails spectacularly for not-so-likely branches and large data sets.

3. Hints Help a Bit

For 4K iterations the results are somewhat worse for 50% probability and somewhat better for close to 100% probability:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/4                    4660 ns       4658 ns     150948
ordered_ifs/100/4                   1673 ns       1673 ns     417073
ordered_ifs_with_hints/50/4         5551 ns       5551 ns     125160
ordered_ifs_with_hints/100/4        1575 ns       1575 ns     437687

But for 8K iterations the results are always a bit better:

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ordered_ifs/50/8                   25636 ns      25635 ns      27852
ordered_ifs/100/8                   3381 ns       3381 ns     207612
ordered_ifs_with_hints/50/8        23191 ns      23190 ns      30028
ordered_ifs_with_hints/100/8        3130 ns       3130 ns     221205

So, the hints also help, but just a tiny bit.

Overall conclusion is: always benchmark the code, because the results may surprise.

Hope that helps.


Based on some of the other answers here, it looks like the only real answer is: it depends. It depends on at least the following (though not necessarily in this order of importance):

  • Relative probability of each branch. This is the original question that was asked. Based on the existing answers, there seems to be some conditions under which ordering by probability helps, but it appears to not always be the case. If the relative probabilities are not very different, then it is unlikely to make any difference what order they are in. However, if the first condition happens 99.999% of the time and the next one is a fraction of what is left, then I would assume that putting the most likely one first would be beneficial in terms of timing.
  • Cost of calculating the true/false condition for each branch. If the time cost of testing the conditions is really high for one branch versus another, then this is likely to have a significant impact on the timing and efficiency. For example, consider a condition that takes 1 time unit to calculate (e.g., checking the state of a Boolean variable) versus another condition that takes tens, hundreds, thousands, or even millions of time units to calculate (e.g., checking the contents of a file on disk or performing a complex SQL query against a large database). Assuming the code checks the conditions in order each time, the faster conditions should probably be first (unless they are dependent on other conditions failing first).
  • Compiler/Interpreter Some compilers (or interpreters) may include optimizations of one kind of another that can affect performance (and some of these are only present if certain options are selected during compilation and/or execution). So unless you are benchmarking two compilations and executions of otherwise identical code on the same system using the exact same compiler where the only difference is the order of the branches in question, you're going to have to give some leeway for compiler variations.
  • Operating System/Hardware As mentioned by luk32 and Yakk, various CPUs have their own optimizations (as do operating systems). So benchmarks are again susceptible to variation here.
  • Frequency of code block execution If the block that includes the branches is rarely accessed (e.g., only once during startup), then it probably matters very little what order you put the branches. On the other hand, if your code is hammering away at this code block during a critical part of your code, then ordering may matter a lot (depending on benchmarks).

The only way to know for certain is to benchmark your specific case, preferably on a system identical to (or very similar to) the intended system on which the code will finally run. If it is intended to run on a set of varying systems with differing hardware, operating system, etc., then it is a good idea to benchmark across multiple variations to see which is best. It may even be a good idea to have the code be compiled with one ordering on one type of system and another ordering on another type of system.

My personal rule of thumb (for most cases, in the absence of a benchmark) is to order based on:

  1. Conditions that rely on the result of prior conditions,
  2. Cost of computing the condition, then
  3. Relative probability of each branch.

The way I usually see this solved for high-performance code is keeping the order that is most readable, but providing hints to the compiler. Here is one example from Linux kernel:

if (likely(access_ok(VERIFY_READ, from, n))) {
    kasan_check_write(to, n);
    res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
    memset(to + (n - res), 0, res);

Here the assumption is that access check will pass, and that no error is returned in res. Trying to reorder either of these if clauses would just confuse the code, but the likely() and unlikely() macros actually help readability by pointing out what is the normal case and what is the exception.

The Linux implementation of those macros uses GCC specific features. It seems that clang and Intel C compiler support the same syntax, but MSVC doesn't have such feature.


Also depends on your compiler and the platform you’re compiling for.

In theory, the most likely condition should make the control jump as less as possible.

Typically the most likely condition should be first:

if (most_likely) {
     // most likely instructions
} else …

The most popular asm’s are based on conditional branches that jump when condition is true. That C code will be likely translated to such pseudo asm:

jump to ELSE if not(most_likely)
// most likely instructions
jump to end
ELSE:
…

This is because jumps make the cpu cancel the execution pipeline and stall because the program counter changed (for architectures that support pipelines which are really common). Then it’s about the compiler, which may or may not apply some sophisticated optimizations about having the statistically most probably condition to get the control make less jumps.