def custom_asymmetric_train(y_true, y_pred):
residual = (y_true - y_pred).astype("float")
grad = np.where(residual>0, -2*10.0*residual, -2*residual)
hess = np.where(residual>0, 2*10.0, 2.0)
return grad, hess
I want to write this statement:
case when residual>=0 and residual<=0.5 then -2*1.2*residual
when residual>=0.5 and residual<=0.7 then -2*1.*residual
when residual>0.7 then -2*2*residual end )
however np.where
cannot write &(and) logic . How do I write this case when logic in the np.where
in python.
Thanks
As you might know, NumPy is one of the important Python modules used in the field of data science and machine learning. As a beginner, it is very important to know about a few NumPy practical examples. 1. How to search the maximum and minimum element in the given array using NumPy? 2. How to sort the elements in the given array using Numpy? 3.
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further. What is an array? # An array is a central data structure of the NumPy library.
NumPy is used to work with arrays. The array object in NumPy is called ndarray. We can create a NumPy ndarray object by using the array () function.
The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result.
This statement can be written using np.select as:
import numpy as np
residual = np.random.rand(10) -0.3 # -0.3 to get some negative values
condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]
residual = np.select(condlist, choicelist, default=residual)
Note that, when multiple conditions are satisfied in condlist
, the first one encountered is used. When all conditions evaluate to False
, it will use the default
value. Moreover, for your information, you need to use bitwise operator &
on boolean numpy arrays as and
python keyword won't work on them.
Let's benchmark these answers:
residual = np.random.rand(10000) -0.3
def charl_3where(residual):
residual = np.where((residual>=0.0)&(residual<=0.5), -2*1.2*residual, residual)
residual = np.where((residual>=0.5)&(residual<=0.7), -2*1.0*residual, residual)
residual = np.where(residual>0.7, -2*2.0*residual, residual)
return residual
def yaco_select(residual):
condlist = [(residual>=0.0)&(residual<=0.5), (residual>=0.5)&(residual<=0.7), residual>0.7]
choicelist = [-2*1.2*residual, -2*1.0*residual,-2*2.0*residual]
residual = np.select(condlist, choicelist, default=residual)
return residual
%timeit charl_3where(residual)
>>> 112 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit yaco_select(residual)
>>> 141 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
let's try to optimize these with numba
from numba import jit
@jit(nopython=True)
def yaco_numba(residual):
out = np.empty_like(residual)
for i in range(residual.shape[0]):
if residual[i]<0.0 :
out[i] = residual[i]
elif residual[i]<=0.5 :
out[i] = -2*1.2*residual[i]
elif residual[i]<=0.7:
out[i] = -2*1.0*residual[i]
else: # residual>0.7
out[i] = -2*2.0*residual[i]
return out
%timeit yaco_numba(residual)
>>> 6.65 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Final check
res1 = charl_3where(residual)
res2 = yaco_select(residual)
res3 = yaco_numba(residual)
np.allclose(res1,res3)
>>> True
np.allclose(res2,res3)
>>> True
This one is about 15x
faster than the previously best one. Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With