Fastest way to get a positive modulo in C/C++

Tags:

Often in my inner loops I need to index an array in a "wrap-around" way, so that (for example) if the array size is 100 and my code asks for element -2, it should be given element 98. In many high level languages such as Python, one can do this simply with my_array[index % array_size], but for some reason C's integer arithmetic (usually) rounds toward zero instead of consistently rounding down, and consequently its modulo operator returns a negative result when given a negative first argument.

Often I know that index will not be less than -array_size, and in these cases I just do my_array[(index + array_size) % array_size]. However, sometimes this can't be guaranteed, and for those cases I would like to know the fastest way to implement an always-positive modulo function. There are several "clever" ways to do it without branching, such as

inline int positive_modulo(int i, int n) {     return (n + (i % n)) % n; }

inline int positive_modulo(int i, int n) {     return (i % n) + (n * (i < 0)); }

Of course I can profile these to find out which is the fastest on my system, but I can't help worrying that I might have missed a better one, or that what's fast on my machine might be slow on a different one.

So is there a standard way to do this, or some clever trick that I've missed that's likely to be the fastest possible way?

Also, I know it's probably wishful thinking, but if there's a way of doing this that can be auto-vectorised, that would be amazing.

258

asked Feb 21 '13 07:02

N. Virgo

1 Answers

The standard way I learned is

inline int positive_modulo(int i, int n) {     return (i % n + n) % n; }

This function is essentially your first variant without the abs (which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".

Edit:

Moving on to your second variant: First of all, it contains a bug, too -- the n < 0 should be i < 0.

This variant may not look as if it branches, but on a lot of architectures, the i < 0 will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0)) with i < 0? n: 0, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.

As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.

answered Sep 24 '22 17:09

Martin B

Related questions
                            
                                Does const-correctness give the compiler more room for optimization?
                            
                                Why is Visual C++ lacking refactor functionality?
                            
                                Is the pass-by-value-and-then-move construct a bad idiom?
                            
                                Is it possible to declare constexpr class in a header and define it in a separate .cpp file?
                            
                                Why is the != operator not allowed with OpenMP?
                            
                                C++ inline member function in .cpp file
                            
                                Why does std::transform and similar cast the 'for' loop increment to (void)?
                            
                                C++11 "auto" semantics
                            
                                ULL suffix on a numeric literal
                            
                                error LNK2005: xxx already defined in MSVCRT.lib(MSVCR100.dll) C:\something\LIBCMT.lib(setlocal.obj)
                            
                                C and C++ : Partial initialization of automatic structure
                            
                                C++ chrono system time in milliseconds, time operations
                            
                                Can boolean operators be used with the preprocessor?
                            
                                Is using #pragma warning push/pop the right way to temporarily alter warning level?
                            
                                Difference between .a .o and .lo file
                            
                                rc.exe no longer found in VS 2015 Command Prompt
                            
                                Getting first value from map in C++
                            
                                Why is Python recursion so expensive and what can we do about it?
                            
                                Performance of qsort vs std::sort?
                            
                                Is the behavior of subtracting two NULL pointers defined?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to get a positive modulo in C/C++

Tags:

c++

performance

c

N. Virgo

People also ask

1 Answers

Martin B

Recent Activity

Donate For Us