Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating a maximum value from multiple threads

Is there a way to update a maximum from multiple threads using atomic operations?

Illustrative example:

std::vector<float> coord_max(128);
#pragma omp parallel for
for (int i = 0; i < limit; ++i) {
    int j = get_coord(i); // can return any value in range [0,128)
    float x = compute_value(j, i);
    #pragma omp critical (coord_max_update)
    coord_max[j] = std::max(coord_max[j], x);
}

In the above case, the critical section synchronizes access to the entire vector, whereas we only need to synchronize access to each of the values independently.

like image 829
Krzysztof Kosiński Avatar asked Feb 18 '13 11:02

Krzysztof Kosiński


2 Answers

Following a suggestion in a comment, I found a solution that does not require locking and instead uses the compare-and-exchange functionality found in std::atomic / boost::atomic. I am limited to C++03 so I would use boost::atomic in this case.

BOOST_STATIC_ASSERT(sizeof(int) == sizeof(float));
union FloatPun { float f; int i; };

std::vector< boost::atomic<int> > coord_max(128);
#pragma omp parallel for
for (int i = 0; i < limit; ++i) {
    int j = get_coord(i);
    FloatPun x, maxval;
    x.f = compute_value(j, i);

    maxval.i = coord_max[j].load(boost::memory_order_relaxed);
    do {
        if (maxval.f >= x.f) break;
    } while (!coord_max[j].compare_exchange_weak(maxval.i, x.i,
        boost::memory_order_relaxed));
}

There is some boilerplate involved in putting float values in ints, since it seems that atomic floats are not lock-free. I am not 100% use about the memory order, but the least restrictive level 'relaxed' seems to be OK, since non-atomic memory is not involved.

like image 163
Krzysztof Kosiński Avatar answered Nov 08 '22 20:11

Krzysztof Kosiński


Not sure about the syntax, but algorithmically, you have three choices:

  1. Lock down the entire vector to guarantee atomic access (which is what you are currently doing).

  2. Lock down individual elements, so that every element can be updated independent of others. Pros: maximum parallelism; Cons: lots of locks required!

  3. Something in-between! Conceptually think of partitioning your vector into 16 (or 32/64/...) "banks" as follows: bank0 consists of vector elements 0, 16, 32, 48, 64, ... bank1 consists of vector elements 1, 17, 33, 49, 65, ... bank2 consists of vector elements 2, 18, 34, 50, 66, ... ... Now, use 16 explicit locks before you access the element and you can have upto 16-way parallelism. To access element n, acquire lock (n%16), finish the access, then release the same lock.

like image 43
Rahul Banerjee Avatar answered Nov 08 '22 20:11

Rahul Banerjee