Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I improve my custom function vectorization using numpy

I am new to python, and even more new to vectorization. I have attempted to vectorize a custom similarity function that should return a matrix of pairwise similarities between each row in an input array.

IMPORTS:

import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view

INPUT:

np.random.seed(11)

a = np.array([0, 0, 0, 0, 0, 10, 0, 0, 0, 50, 0, 0, 5, 0, 0, 10])
b = np.array([0, 0, 5, 0, 0, 10, 0, 0, 0, 50, 0, 0, 10, 0, 0, 5])
c = np.array([0, 0, 5, 1, 0, 20, 0, 0, 0, 30, 0, 1, 10, 0, 0, 5])

m = np.array((a,b,c))

OUTPUT:

custom_func(m)

array([[   0,  440, 1903],
       [ 440,    0, 1603],
       [1903, 1603,    0]])

FUNCTION:

def custom_func(arr):
    diffs = 0
    max_k = 6
    
    for n in range(1, max_k):

        arr1 = np.array([np.sum(i, axis = 1) for i in sliding_window_view(arr, window_shape = n, axis = 1)])
    
        # this function uses np.maximum and np.minimum to subtract the max and min elements (element-wise) between two rows and then sum up the entire of that subtraction
        diffs += np.sum((np.array([np.maximum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))]) - np.array([np.minimum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))])), axis = 1) * n
    
    diffs = diffs.reshape(len(arr), -1)
    
    return diffs

The function is quite simple, it sums up the element-wise differences between max and minimum of rows in N sliding windows. This function is much faster than what I was using before finding out about vectorization today (for loops and pandas dataframes yay).

My first thought is to figure out a way to find both the minimum and maximum of my arrays in a single pass since I currently THINK it has to do two passes, but I was unable to figure out how. Also there is a for loop in my current function because I need to do this for multiple N sliding windows, and I am not sure how to do this without the loop.

Any help is appreciated!

like image 760
dddxxx Avatar asked Feb 11 '26 11:02

dddxxx


1 Answers

Here are the several optimizations you can apply on the code:

  • use the Numba's JIT to speed up the computation and replace the product call with nested loops
  • use a more efficient sliding window algorithm (better complexity)
  • avoid to compute multiple time product and arrange in the loop
  • reduce the number of implicit temporary arrays allocated (and array Numpy calls)
  • do not compute the lower triangular part of diffs since it will always be symmetric
    (just copy the upper triangular part)
  • use integer-based indexing rather than slow slow floating-point one

Here is the resulting code:

import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view
import numba as nb

@nb.njit
def custom_func_fast(arr):
    h, w = arr.shape[0], arr.shape[1]
    diffs = np.zeros((h, h), dtype=arr.dtype)
    max_k = 6

    for n in range(1, max_k):
        arr1 = np.empty(shape=(h, w-n+1), dtype=arr.dtype)

        for i in range(h):
            # Efficient sliding window algorithm
            assert w >= n
            s = np.sum(arr[i, 0:n])
            arr1[i, 0] = s
            for j in range(n, w):
                s -= arr[i, j-n]
                s += arr[i, j]
                arr1[i, j-n+1] = s

        # Efficient distance matrix computation
        for i in range(h):
            for j in range(i+1, h):
                s = 0
                for k in range(w-n+1):
                    s += np.abs(arr1[i,k] - arr1[j,k])
                diffs[i, j] += s * n

    # Fill the lower triangular part
    for i in range(h):
        for j in range(i):
            diffs[i, j] = diffs[j, i]

    return diffs

The resulting code is 290 times faster on the example input array on my machine.

like image 106
Jérôme Richard Avatar answered Feb 14 '26 03:02

Jérôme Richard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!