Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Numpy vectorize nested for-loops for combinatorics

Given an nxn array A of real positive numbers, I'm trying to find the minimum of the maximum of the element-wise minimum of all combinations of three rows of the 2-d array. Using for-loops, that comes out to something like this:

import numpy as np

n = 100
np.random.seed(2)
A = np.random.rand(n,n)
global_best = np.inf

for i in range(n-2):
    for j in range(i+1, n-1):
        for k in range(j+1, n):
            # find the maximum of the element-wise minimum of the three vectors
            local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))
            # if local_best is lower than global_best, update global_best
            if (local_best < global_best):
                global_best = local_best
                save_rows = [i, j, k]

print global_best, save_rows

In the case for n = 100, the output should be this:

Out[]: 0.492652949593 [6, 41, 58]

I have a feeling though that I could do this much faster using Numpy vectorization, and would certainly appreciate any help on doing this. Thanks.

like image 801
ToneDaBass Avatar asked Apr 24 '18 03:04

ToneDaBass


People also ask

What is the use of vectorize in NumPy?

numpy.vectorize () function The vectorize () function is used to generalize function class. Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.

Why do we use vectorized array functions in Python?

Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.

What is the difference between Py Func and vectorize?

If provided, pyfunc will be called with (and expected to return) arrays with shapes given by the size of corresponding core dimensions. By default, pyfunc is assumed to take scalars as input and output. Vectorized function. The vectorize function is provided primarily for convenience, not for performance.

What is vectorized processing in Python?

Vectorized processing, in contrast, may be applied when the order of processing does not matter. As suggested above, the built-in methods in NumPy and Pandas are built with C, which allows for vectorization.


1 Answers

This solution is 5x faster for n=100:

coms = np.fromiter(itertools.combinations(np.arange(n), 3), 'i,i,i').view(('i', 3))
best = A[coms].min(1).max(1)
at = best.argmin()
global_best = best[at]
save_rows = coms[at]

The first line is a bit convoluted but turns the result of itertools.combinations into a NumPy array which contains all possible [i,j,k] index combinations.

From there, it's a simple matter of indexing into A using all the possible index combinations, then reducing along the appropriate axes.

This solution consumes a lot more memory as it builds the concrete array of all possible combinations A[coms]. It saves time for smallish n, say under 250, but for large n the memory traffic will be very high and it may be slower than the original code.

like image 94
John Zwinck Avatar answered Oct 13 '22 13:10

John Zwinck