Updating a maximum value from multiple threads

Question

Is there a way to update a maximum from multiple threads using atomic operations?

Illustrative example:

std::vector<float> coord_max(128);
#pragma omp parallel for
for (int i = 0; i < limit; ++i) {
    int j = get_coord(i); // can return any value in range [0,128)
    float x = compute_value(j, i);
    #pragma omp critical (coord_max_update)
    coord_max[j] = std::max(coord_max[j], x);
}

In the above case, the critical section synchronizes access to the entire vector, whereas we only need to synchronize access to each of the values independently.

Krzysztof Kosiński · Accepted Answer

Following a suggestion in a comment, I found a solution that does not require locking and instead uses the compare-and-exchange functionality found in std::atomic / boost::atomic. I am limited to C++03 so I would use boost::atomic in this case.

BOOST_STATIC_ASSERT(sizeof(int) == sizeof(float));
union FloatPun { float f; int i; };

std::vector< boost::atomic<int> > coord_max(128);
#pragma omp parallel for
for (int i = 0; i < limit; ++i) {
    int j = get_coord(i);
    FloatPun x, maxval;
    x.f = compute_value(j, i);

    maxval.i = coord_max[j].load(boost::memory_order_relaxed);
    do {
        if (maxval.f >= x.f) break;
    } while (!coord_max[j].compare_exchange_weak(maxval.i, x.i,
        boost::memory_order_relaxed));
}

There is some boilerplate involved in putting float values in ints, since it seems that atomic floats are not lock-free. I am not 100% use about the memory order, but the least restrictive level 'relaxed' seems to be OK, since non-atomic memory is not involved.

Rahul Banerjee · Answer

Not sure about the syntax, but algorithmically, you have three choices:

Lock down the entire vector to guarantee atomic access (which is what you are currently doing).
Lock down individual elements, so that every element can be updated independent of others. Pros: maximum parallelism; Cons: lots of locks required!
Something in-between! Conceptually think of partitioning your vector into 16 (or 32/64/...) "banks" as follows: bank0 consists of vector elements 0, 16, 32, 48, 64, ... bank1 consists of vector elements 1, 17, 33, 49, 65, ... bank2 consists of vector elements 2, 18, 34, 50, 66, ... ... Now, use 16 explicit locks before you access the element and you can have upto 16-way parallelism. To access element n, acquire lock (n%16), finish the access, then release the same lock.

Updating a maximum value from multiple threads

Tags:

c++

multithreading

atomic

openmp

Krzysztof Kosiński

2 Answers

Krzysztof Kosiński

Rahul Banerjee

Recent Activity

Donate For Us

Updating a maximum value from multiple threads

Tags:

c++

multithreading

atomic

openmp

Krzysztof Kosiński

2 Answers

Krzysztof Kosiński

Rahul Banerjee

Related questions

Recent Activity

Donate For Us