Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a case when like statement in numpy array

def custom_asymmetric_train(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual>0, -2*10.0*residual, -2*residual)
    hess = np.where(residual>0, 2*10.0, 2.0)
    return grad, hess

I want to write this statement:

    case when residual>=0 and residual<=0.5 then -2*1.2*residual
    when residual>=0.5 and residual<=0.7 then -2*1.*residual
    when residual>0.7 then -2*2*residual end ) 

however np.where cannot write &(and) logic . How do I write this case when logic in the np.where in python.

Thanks

like image 676
Mike Avatar asked Jan 10 '20 18:01

Mike


People also ask

What are the practical examples of NumPy?

As you might know, NumPy is one of the important Python modules used in the field of data science and machine learning. As a beginner, it is very important to know about a few NumPy practical examples. 1. How to search the maximum and minimum element in the given array using NumPy? 2. How to sort the elements in the given array using Numpy? 3.

What is the difference between NumPy arrays and lists?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further. What is an array? # An array is a central data structure of the NumPy library.

What is ndarray in NumPy?

NumPy is used to work with arrays. The array object in NumPy is called ndarray. We can create a NumPy ndarray object by using the array () function.

How does the case statement work in Python?

The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result.


1 Answers

This statement can be written using np.select as:

import numpy as np

residual = np.random.rand(10) -0.3 # -0.3 to get some negative values
condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]

residual = np.select(condlist, choicelist, default=residual)

Note that, when multiple conditions are satisfied in condlist, the first one encountered is used. When all conditions evaluate to False, it will use the default value. Moreover, for your information, you need to use bitwise operator & on boolean numpy arrays as and python keyword won't work on them.

Let's benchmark these answers:

residual = np.random.rand(10000) -0.3

def charl_3where(residual):
    residual = np.where((residual>=0.0)&(residual<=0.5), -2*1.2*residual, residual)
    residual = np.where((residual>=0.5)&(residual<=0.7), -2*1.0*residual, residual)
    residual = np.where(residual>0.7, -2*2.0*residual, residual)
    return residual

def yaco_select(residual):
    condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
    choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]
    residual = np.select(condlist, choicelist, default=residual)
    return residual


%timeit charl_3where(residual)
>>> 112 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit yaco_select(residual)
>>> 141 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

let's try to optimize these with numba

from numba import jit

@jit(nopython=True)
def yaco_numba(residual):
    out = np.empty_like(residual)
    for i in range(residual.shape[0]):
        if residual[i]<0.0 :
            out[i] = residual[i]
        elif residual[i]<=0.5 :
            out[i] = -2*1.2*residual[i]
        elif residual[i]<=0.7:
            out[i] = -2*1.0*residual[i]
        else: # residual>0.7
            out[i] = -2*2.0*residual[i]        
    return out

%timeit yaco_numba(residual)
>>> 6.65 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Final check

res1 = charl_3where(residual)
res2 = yaco_select(residual)
res3 = yaco_numba(residual)
np.allclose(res1,res3)
>>> True
np.allclose(res2,res3)
>>> True

This one is about 15x faster than the previously best one. Hope this helps.

like image 107
Yacola Avatar answered Nov 14 '22 12:11

Yacola