Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the minimum and maximum value within a Metal texture

I have a MTLTexture containing 16bit unsigned integers (MTLPixelFormatR16Uint). The values range from about 7000 to 20000, with 0 being used as a 'nodata' value, which is why it is skipped in the code below. I'd like to find the minimum and maximum values so I can rescale these values between 0-255. Ultimately I'll be looking to base the minimum and maximum values on a histogram of the data (it has some outliers), but for now I'm stuck on simply extracting the min/max.

I can read the data from the GPU to CPU and pull the min/max values out but would prefer to perform this task on the GPU.

First attempt

The command encoder is dispatched with 16x16 threads per thread group, the number of thread groups is based on the texture size (eg; width = textureWidth / 16, height = textureHeight / 16).

typedef struct {
    atomic_uint min;
    atomic_uint max;
} BandMinMax;

kernel void minMax(texture2d<ushort, access::read> band1 [[texture(0)]],
                   device BandMinMax &out [[buffer(0)]],
                   uint2 gid [[thread_position_in_grid]])
{
    ushort value = band1.read(gid).r;

    if (value != 0) {
        uint currentMin = atomic_load_explicit(&out.min, memory_order_relaxed);
        uint currentMax = atomic_load_explicit(&out.max, memory_order_relaxed);

        if (value > currentMax) {
            atomic_store_explicit(&out.max, value, memory_order_relaxed);
        }
        if (value < currentMin) {
            atomic_store_explicit(&out.min, value, memory_order_relaxed);
        }
    }
}

From this I get a minimum and maximum value, but for the same dataset the min and max will often return different values. Fairly certain this is the min and max from a single thread when there are multiple threads running.

Second attempt

Building on the previous attempt, this time I'm storing the individual min/max values from each thread, all 256 (16x16).

kernel void minMax(texture2d<ushort, access::read> band1 [[texture(0)]],
                   device BandMinMax *out [[buffer(0)]],
                   uint2 gid [[thread_position_in_grid]],
                   uint tid [[ thread_index_in_threadgroup ]])
{

    ushort value = band1.read(gid).r;

    if (value != 0) {
        uint currentMin = atomic_load_explicit(&out[tid].min, memory_order_relaxed);
        uint currentMax = atomic_load_explicit(&out[tid].max, memory_order_relaxed);

        if (value > currentMax) {
            atomic_store_explicit(&out[tid].max, value, memory_order_relaxed);
        }
        if (value < currentMin) {
            atomic_store_explicit(&out[tid].min, value, memory_order_relaxed);
        }
    }
}

This returns an array containing 256 sets of min/max values. From these I guess I could find the lowest of the minimum values, but this seems like a poor approach. Would appreciate a pointer in the right direction, thanks!

like image 350
lock Avatar asked Apr 16 '16 11:04

lock


People also ask

How do you find maximum and minimum values?

Solve for x.Use basic rules of algebra to rearrange the function and solve the value for x, when the derivative equals zero. This solution will tell you the x-coordinate of the vertex of the function, which is where the maximum or minimum will occur.

How do you find the maximum and minimum value of a function in a range?

We have to find the maximum or minimum value of the function. The maximum or minimum of a quadratic function occurs at x = -b/2a. If a is negative, the maximum value of the function is f(-b/2a). If a is positive, the minimum value of the function is f(-b/2a).

How do you find maximum values?

If you are given the formula y = ax2 + bx + c, then you can find the maximum value using the formula max = c - (b2 / 4a). If you have the equation y = a(x-h)2 + k and the a term is negative, then the maximum value is k.

How do you find the minimum value?

If your quadratic equation has a positive a term, it will also have a minimum value. You can find this minimum value by graphing the function or by using one of the two equations. If you have the equation in the form of y = ax^2 + bx + c, then you can find the minimum value using the equation min = c - b^2/4a.


1 Answers

The Metal Shading Language has atomic compare-and-swap functions you can use to compare the existing value at a memory location with a value, and replace the value at that location if they don't compare equal. With these, you can create a set of atomic compare-and-replace-if-[greater|less]-than operations:

static void atomic_uint_exchange_if_less_than(volatile device atomic_uint *current, uint candidate)
{
    uint val;
    do {
        val = *((device uint *)current);
    } while ((candidate < val || val == 0) && !atomic_compare_exchange_weak_explicit(current,
                                                                                     &val,
                                                                                     candidate,
                                                                                     memory_order_relaxed,
                                                                                     memory_order_relaxed));
}

static void atomic_uint_exchange_if_greater_than(volatile device atomic_uint *current, uint candidate)
{
    uint val;
    do {
        val = *((device uint *)current);
    } while (candidate > val && !atomic_compare_exchange_weak_explicit(current,
                                                                       &val,
                                                                       candidate,
                                                                       memory_order_relaxed,
                                                                       memory_order_relaxed));
}

To apply these, you might create a buffer that contains one interleaved min, max pair per threadgroup. Then, in the kernel function, read from the texture and conditionally write the min and max values:

kernel void min_max_per_threadgroup(texture2d<ushort, access::read> texture [[texture(0)]],
                                    device uint *mapBuffer [[buffer(0)]],
                                    uint2 tpig [[thread_position_in_grid]],
                                    uint2 tgpig [[threadgroup_position_in_grid]],
                                    uint2 tgpg [[threadgroups_per_grid]])
{
    ushort val = texture.read(tpig).r;

    device atomic_uint *atomicBuffer = (device atomic_uint *)mapBuffer;

    atomic_uint_exchange_if_less_than(atomicBuffer + ((tgpig[1] * tgpg[0] + tgpig[0]) * 2),
                                      val);

    atomic_uint_exchange_if_greater_than(atomicBuffer + ((tgpig[1] * tgpg[0] + tgpig[0]) * 2) + 1,
                                         val);
}

Finally, run a separate kernel to reduce over this buffer and collect the final min, max values across the entire texture:

kernel void min_max_reduce(constant uint *mapBuffer [[buffer(0)]],
                           device uint *reduceBuffer [[buffer(1)]],
                           uint2 tpig [[thread_position_in_grid]])
{
    uint minv = mapBuffer[tpig[0] * 2];
    uint maxv = mapBuffer[tpig[0] * 2 + 1];

    device atomic_uint *atomicBuffer = (device atomic_uint *)reduceBuffer;

    atomic_uint_exchange_if_less_than(atomicBuffer, minv);

    atomic_uint_exchange_if_greater_than(atomicBuffer + 1, maxv);
}

Of course, you can only reduce over the total allowed thread execution width of the device (~256), so you may need to do the reduction in multiple passes, with each one reducing the size of the data to be operated on by a factor of the maximum thread execution width.

Disclaimer: This may not be the best technique, but it does appear to be correct in my limited testing of an OS X implementation. It was marginally faster than a naive CPU implementation on a 256x256 texture on Intel Iris Pro, but substantially slower on an Nvidia GT 750M (because of dispatch overhead).

like image 122
warrenm Avatar answered Sep 29 '22 19:09

warrenm