Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing repeated modulus within a loop

I have this statement in my c program and I want to optimize. By optimization I particularly want to refer to bitwise operators (but any other suggestion is also fine).

uint64_t h_one = hash[0];
uint64_t h_two = hash[1];
for ( int i=0; i<k; ++i )
{
    (uint64_t *) k_hash[i] = ( h_one + i * h_two ) % size;   //suggest some optimization for this line.
} 

Any suggestion will be of great help.

Edit: As of now size can be any int but it is not a problem and we can round it up to the next prime (but may be not a power of two as for larger values the power of 2 increases rapidly and it will lead to much wastage of memory)

h_two is a 64 bit int(basically a chuck of 64 bytes).

like image 978
Aman Deep Gautam Avatar asked Jun 15 '12 19:06

Aman Deep Gautam


1 Answers

so essentially you're doing

k_0 = h_1 mod s
k_1 = h_1 + h_2 mod s = k_0 + h_2 mod s
k_2 = h_1 + h_2 + h_2 mod s = k_1 + h_2 mod s
..
k_n = k_(n-1) + h_2 mod s

Depending on overflow issues (which shouldn't differ from the original if size is less than half of 2**64), this could be faster (less easy to parallelize though):

uint64_t h_one = hash[0];
uint64_t h_two = hash[1];
k_hash[0] = h_one % size;
for ( int i=1; i<k; ++i )
{
    (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two ) % size;
} 

Note there is a possibility that your compiler already came to this form, depending on which optimization flags you use.

Of course this only eliminated one multiplication. If you want to eliminate or reduce the modulo, I guess that based on h_two%size and h_1%size you can predetermine the steps where you have to explicitly call %size, something like this:

uint64_t h_one = hash[0]%size;
uint64_t h_two = hash[1]%size;
k_hash[0] = h_one;
step = (size-(h_one))/(h_two)-1;
for ( int i=1; i<k; ++i )
{
    (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two );
    if(i==step)
    {
        k_hash[i] %= size;
    }
} 

Note I'm not sure of the formula (didn't test it), it's more a general idea. This would greatly depend on how good your branch prediction is (and how big a performance-hit a misprediction is). ALso it's only likely to help if step is big.

edit: or more simple (and probably with the same performance) -thanks to Mystical:

uint64_t h_one = hash[0]%size;
uint64_t h_two = hash[1]%size;
k_hash[0] = h_one;
for ( int i=1; i<k; ++i )
{
    (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two );
    if(k_hash[i] > size)
    {
        k_hash[i] -= size;
    }
} 
like image 137
KillianDS Avatar answered Nov 07 '22 11:11

KillianDS