How to vectorize finding max value in numpy array with if statement?

Q: What is the way to find the maximum number in the Numpy array?

Python Numpy – Get Maximum Value of Array Given a numpy array, you can find the maximum value of all the elements in the array. To get the maximum value of a Numpy Array, you can use numpy function numpy. max() function.

Q: What does vectorize do in Numpy?

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

Q: Does Numpy vectorize fast?

Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance.

Context:

I tried to implement the Sequential Minimal Optimization algorithm for solving SVM. I use maximal violating pair approach.

The problem:

In working set selection procedure i want to find maximum value of gradient and its index for elements which met some condition, y[i]*alpha[i]<0 or y[i]*alpha[i]

#y - array of -1 and 1
y=np.array([-1,1,1,1,-1,1])
#alpha- array of floats in range [0,C]
alpha=np.array([0.4,0.1,1.33,0,0.9,0])
#grad - array of floats
grad=np.array([-1,-1,-0.2,-0.4,0.4,0.2])


GMaxI=float('-inf')       
GMax_idx=-1        
n=alpha.shape[0] #usually n=100000
C=4
B=[0,0,C]
for i in xrange(0,n):
    yi=y[i]  #-1 or 1
    alpha_i=alpha[i]

    if (yi * alpha_i< B[yi+1]): # B[-1+1]=0 B[1+1]=C
        if( -yi*grad[i]>=GMaxI):
            GMaxI= -yi*grad[i]
            GMax_idx = i

This procedure is called many times (~50000) and profiler shows that this is the bottleneck. It is possible to vectorize this code?

Edit 1: Add some small exemplary data

Edit 2: I have checked solution proposed by hwlau , larsmans and Mr E. Only solutions proposed Mr E is correct. Below sample code with all three answers:

import numpy as np
y=np.array([   -1,  -1,   -1,  -1,   -1,   -1,   -1,   -1])
alpha=np.array([0,  0.9,  0.4, 0.1, 1.33,    0,  0.9,    0])
grad=np.array([-3, -0.5,   -1,  -1, -0.2,   -4, -0.4, -0.3])
C=4
B=np.array([0,0,C])

#hwlau - wrong index and value
filter = (y*alpha < C*0.5*(y+1)).astype('float')
GMax_idx = (filter*(-y*grad)).argmax()
GMax = -y[GMax_idx]*grad[GMax_idx]

print GMax_idx,GMax

#larsmans - wrong index
neg_y_grad = (-y * grad)[y * alpha < B[y + 1]]
GMaxI = np.max(neg_y_grad)
GMax_ind = np.argmax(neg_y_grad)

print GMax_ind,GMaxI

#Mr E - correct result
BY = np.take(B, y+1)
valid_mask = (y * alpha < BY)
values = -y * grad
values[~valid_mask] = np.min(values) - 1.0

GMaxI = values.max()
GMax_idx = values.argmax()
print GMax_idx,GMaxI

Output (GMax_idx, GMaxI)
0 -3.0 
3 -0.2
4 -0.2

Conclusions

After checking all solutions, the fastest one (2x-6x) is solution proposed by @ali_m. However it requires to install some python packages: numba and all its prerequisites.

I have some trouble to use numba with class methods, so I create global functions which are autojited with numba, my solution look something like this:

from numba import autojit  

@autojit
def FindMaxMinGrad(A,B,alpha,grad,y):
    '''
    Finds i,j indices with maximal violatin pair scheme
    A,B - 3 dim arrays, contains bounds A=[-C,0,0], B=[0,0,C]
    alpha - array like, contains alpha coeficients
    grad - array like, gradient
    y - array like, labels
    '''
    GMaxI=-100000
    GMaxJ=-100000

    GMax_idx=-1
    GMin_idx=-1

    for i in range(0,alpha.shape[0]):

        if (y[i] * alpha[i]< B[y[i]+1]):
            if( -y[i]*grad[i]>GMaxI):
                GMaxI= -y[i]*grad[i]
                GMax_idx = i

        if (y[i] * alpha[i]> A[y[i]+1]):
            if( y[i]*grad[i]>GMaxJ):
                GMaxJ= y[i]*grad[i]
                GMin_idx = i

    return (GMaxI,GMaxJ,GMax_idx,GMin_idx)  

class SVM(object):

    def working_set(self,....):
        FindMaxMinGrad(.....)

561

asked Dec 19 '13 12:12

ksopyla

1 Answers

You can probably do quite a lot better than plain vectorization if you use numba to JIT-compile your original code that used nested loops.

import numpy as np
from numba import autojit

@autojit
def jit_max_grad(y, alpha, grad, B):
    maxgrad = -inf
    maxind = -1
    for ii in xrange(alpha.shape[0]):
        if (y[ii] * alpha[ii] < B[y[ii] + 1]):
            g = -y[ii] * grad[ii]
            if g >= maxgrad:
                maxgrad = g
                maxind = ii
    return maxind, maxgrad

For comparison, here's Mr E's vectorized version:

def mr_e_max_grad(y, alpha, grad, B):

    BY = np.take(B, y+1)
    valid_mask = (y * alpha < BY)
    values = -y * grad
    values[~valid_mask] = np.min(values) - 1.0
    GMaxI = values.max()
    GMax_idx = values.argmax()
    return GMax_idx, GMaxI

Timing:

y = np.array([   -1,  -1,   -1,  -1,   -1,   -1,   -1,   -1])
alpha = np.array([0,  0.9,  0.4, 0.1, 1.33,    0,  0.9,    0])
grad = np.array([-3, -0.5,   -1,  -1, -0.2,   -4, -0.4, -0.3])
C = 4
B = np.array([0,0,C])

%timeit mr_e_max_grad(y, alpha, grad, B)
# 100000 loops, best of 3: 19.1 µs per loop

%timeit jit_max_grad(y, alpha, grad, B)
# 1000000 loops, best of 3: 1.07 µs per loop

Update: if you want to see what the timings look like on bigger arrays, it's easy to define a function that generates semi-realistic fake data based on your description in the question:

def make_fake(n, C=4):
    y = np.random.choice((-1, 1), n)
    alpha = np.random.rand(n) * C
    grad = np.random.randn(n)
    B = np.array([0,0,C])
    return y, alpha, grad, B

%%timeit y, alpha, grad, B = make_fake(100000, 4)
mr_e_max_grad(y, alpha, grad, B)
# 1000 loops, best of 3: 1.83 ms per loop

%%timeit y, alpha, grad, B = make_fake(100000, 4)
jit_max_grad(y, alpha, grad, B)
# 1000 loops, best of 3: 471 µs per loop

answered Sep 22 '22 15:09

ali_m

Related questions
                            
                                Python/Numpy: How do you assign the end+1 element of an array similar to how it's done in Matlab?
                            
                                Django REST framework absolute urls with nginx always return 127.0.0.1
                            
                                python get current variables of the caller function
                            
                                using variable in import command
                            
                                Python regex: find a substring that doesn't contain a substring
                            
                                Fixing code to make a triangle
                            
                                Is there a Python API for R's ggplot2? [closed]
                            
                                keeps text rotated in data coordinate system after resizing?
                            
                                Django Rest Framework Paginate By Specified GET Parameter
                            
                                So what is the story with 4 quotes?
                            
                                Django: Map data from an external APIs into a model?
                            
                                Summing each 3x3 window of a M*N matrix, into a M/3*N/3 matrix with numpy
                            
                                numpy.dot how to calculate 1-D array with 2-D array
                            
                                Error installing pydev [duplicate]
                            
                                Theano element wise maximum
                            
                                Expand/collapse ttk Treeview branch
                            
                                How to separate string and number in Python list? [closed]
                            
                                Python: How do I get a list of all keys in a dictionary of dictionaries, at a given depth [closed]
                            
                                Python scipy chisquare returns different values than R chisquare
                            
                                Pandas: iterate over unique values of a column that is already in sorted order

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to vectorize finding max value in numpy array with if statement?

Tags:

python

arrays

vectorization

max

numpy