Quickly generating the "triangle sequence": avoiding mispredictions

Question

I'm interested in calculating the triangle sequence¹, which is the sequence of pairs (i, j): (0, 0), (1, 0), (1, 1), (2, 0), (2, 1) ... which iterates though all pairs (i, j) with the restriction that i >= j. The same sequence with but with the restriction i > j is also interesting.

These sequences represent, among others things, all the ways to choose 2 (possibly identical) elements from a n-element set (for the sequence up to (n, n)²), or the indices of the lower triagular elements of a matrix³. The sequence of values for i alone is A003056 in OEIS, while j alone is A002262. The sequence frequently arises in combinartorial algorithms, where their performance may be critical.

A simple but branchy way to generate the next value in the sequence is:

if (i == j) {
    j = 0;
    i++;
  } else {
    j++;
  }      
}

However, this suffers from many mispredicts while calculating the initial elements of the sequence, when checking the condition (i == j) - generally one mispredict each time i is incremented. As the sequence increases, the number of mispredicts becomes lower since i is incremented with reduced frequency, so the j++ branch dominates and is well predicted. Still, some types of combinatorial search repeatedly iterate over the small terms in the sequence, so I'm looking for a branch-free approach or some other approach that suffers fewer mispredicts.

For many uses, the order of the sequences isn't as important, so generating the values in differnet order than above is a allowable if it leads to a better solution. For example, j could count down rather than up: (0, 0), (1, 1), (1, 0), (2, 2), (2, 1), ....

¹ I'm also interested in knowing what the right name for this sequence is (perhaps so I make a better title for this question). I just kind of made up "triangle sequence".

² Here, the i >= j version represents sub-multisets (repetition allowed), while the i > j variant represents normal subsets (no repetition).

³ Here, the i >= j version includes the main diagonal, while the i > j variant excludes it.

Evgeny Kluev · Accepted Answer

Here are two branch-free approaches that do not use any expensive calculations. First one uses comparison and logical AND:

const bool eq = i == j;
i += eq;
j = (j + 1) & (eq - 1);

Second one uses comparison and multiplication:

const bool eq = i == j;
i += eq;
j = (j + 1) * (1 - eq);

In theory "multiplication" variant should be slower than "logical" one, but measurements show very little difference.

Both approaches would result in branchless code only for processors that allow branchless comparisons (for example x86). Also these approaches assume to be implemented using a language where results of conditional expressions could be easily converted to integers (for example C/C++, where "false" comparisons are converted to zero integers, and "true" ones - to integers equal to "1").

The only problem with these approaches is performance. They could in theory outperform branchy code, but only when mispredicts are really frequent. A simple test where there is no other work besides generating "triangle sequence" (see it on ideone) shows miserable mispredict rate and therefore both branchless methods about 3 times slower than branchy one. The explanation is simple: there should be not much mispredicts for longer sequences; as for shorter ones, modern processors have very good branch predictors that almost never fail in case of short branch patterns; so we have not many mispredicts, branchy code almost always executes only 2 instructions (compare, increment), while branchless code executes both active and incative "branches" plus some instructions specific to branchless approach.

In case you want to repeatedly iterate over the small terms in the sequence, probably other approach would be preferable. You calculate the sequence only once, then repeatedly read it from memory.

Quickly generating the "triangle sequence": avoiding mispredictions

Tags:

performance

language-agnostic

bit-manipulation

sequences

oeis

BeeOnRope

1 Answers

Evgeny Kluev

Recent Activity

Donate For Us

Quickly generating the "triangle sequence": avoiding mispredictions

Tags:

performance

language-agnostic

bit-manipulation

sequences

oeis

BeeOnRope

1 Answers

Evgeny Kluev

Related questions

Recent Activity

Donate For Us