Implement Divide and Conquer strategy to apply transformation for a large matrix

Question

I want to apply Arnold's cat map to my matrix. Here is my implementation:

import numpy as np

def cat_mapping(matrix, MAX):
    width, height = matrix.shape
    transformed_matrix = np.empty([width, height]).astype(np.uint8)
    counter = 1
    x = np.repeat(np.arange(width).reshape(-1, 1), height, axis=-1).T
    y = np.repeat(np.arange(height).reshape(-1, 1), width, axis=-1)
    nx = (2 * x + y) % width
    ny = (x + y) % height
    while counter <= MAX:
        transformed_matrix[ny, nx] = matrix[y, x]
        matrix = transformed_matrix
        if counter != MAX:
            transformed_matrix = np.empty([width, height])
        counter = counter + 1
    return transformed_matrix

Which work perfectly. But when the size of the array increase >10000 with bigger iteration value MAX, this implementation became really slow. Even I use numba, but the result is not satisfactory.

I was thinking, can the transformation could be broken into smaller part and combine the result like Divide and Conquer does?

Update

@JeromeRichard helped to make it faster using numba which is nice. But, I think, is it become more faster if somehow we manage to implement DC paradigm?. I tried to implement with some demo data like this:

def split(matrix):
    row, col = matrix.shape
    row2, col2 = row//2, col//2
    return matrix[:row2, :col2], matrix[:row2, col2:], matrix[row2:, :col2], matrix[row2:, col2:]

main = np.arange(1000*1000).reshape(1000,1000)
a,b,c,d = split(main)
a = cat_mapping_fast(a,100)
b = cat_mapping_fast(b,100)
c = cat_mapping_fast(c,100)
d = cat_mapping_fast(d,100)
np.vstack((np.hstack((a, b)), np.hstack((c, d))))

But I couldn't come up with deeper recursion because of "How to merge them?".

Any solution or hint will be appreciated.

Jérôme Richard · Accepted Answer

The current code is quite slow because of matrix[y, x] create a new temporary array, and transformed_matrix[ny, nx] = matrix[y, x] is pretty slow since it needs to read nx and ny from the memory hierarchy and the memory access pattern is not efficient. When the matrix is big, the code should be memory bound and the unneeded memory operation becomes pretty expensive. Note that the np.empty([width, height]) array contains double-precision floating-point numbers that takes 8 time more space than np.uint8 and so it is 8 times slower to fill in memory.

You can speed up a lot the code using Numba, basic loops and double buffering so to avoid creating many temporary arrays and read big ones. The idea is to compute the indices (ny, nx) on-the-fly within the loops. Since modulus are quite expensive and the memory access pattern cause the code to be more latency bound, multiple threads are used so to better saturate the memory. Here is the resulting code:

import numba as nb

@nb.njit('uint8[:,::1](uint8[:,::1], int_)', parallel=True)
def cat_mapping_fast(matrix, MAX):
    height, width = matrix.shape
    buff1 = np.empty((height, width), dtype=np.uint8)
    buff2 = np.empty((height, width), dtype=np.uint8)
    counter = 1
    buff1[:,:] = matrix
    for counter in range(1, MAX+1):
        for y in nb.prange(height):
            for x in range(width):
                nx = (2 * x + y) % width
                ny = (x + y) % height
                buff2[ny, nx] = buff1[y, x]
        buff1, buff2 = buff2, buff1
    return buff1

This code is significantly faster than the initial one, especially when the 2 buffers fit in the CPU cache. When the input matrix is so huge that it does not fit in the cache, the inefficient memory access pattern makes the code a bit slower but there is not much to do since the computation appears to behave like a big shuffle which is not cache-friendly. Still, on a 4096x4096 matrix with MAX=20, the Numba code is 25 times faster on my 6-core machine (only about 0.38 seconds compared to 10 seconds for the initial code).

Implement Divide and Conquer strategy to apply transformation for a large matrix

Tags:

python

arrays

optimization

divide-and-conquer

Update

falamiw

1 Answers

Jérôme Richard

Recent Activity

Donate For Us

Implement Divide and Conquer strategy to apply transformation for a large matrix

Tags:

python

arrays

optimization

divide-and-conquer

Update

falamiw

1 Answers

Jérôme Richard

Related questions

Recent Activity

Donate For Us