I want to replace outliners from a list. Therefore I define a upper and lower bound. Now every value above upper_bound
and under lower_bound
is replaced with the bound value. My approach was to do this in two steps using a numpy array.
Now I wonder if it's possible to do this in one step, as I guess it could improve performance and readability.
Is there a shorter way to do this?
import numpy as np
lowerBound, upperBound = 3, 7
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[arr > upperBound] = upperBound
arr[arr < lowerBound] = lowerBound
# [3 3 3 3 4 5 6 7 7 7]
print(arr)
You can use numpy.clip
:
In [1]: import numpy as np
In [2]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [3]: lowerBound, upperBound = 3, 7
In [4]: np.clip(arr, lowerBound, upperBound, out=arr)
Out[4]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
In [5]: arr
Out[5]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
For an alternative that doesn't rely on numpy
, you could always do
arr = [max(lower_bound, min(x, upper_bound)) for x in arr]
If you just wanted to set an upper bound, you could of course write arr = [min(x, upper_bound) for x in arr]
. Or similarly if you just wanted a lower bound, you'd use max
instead.
Here, I've just applied both operations, written together.
Edit: Here's a slightly more in-depth explanation:
Given an element x
of the array (and assuming that your upper_bound
is at least as big as your lower_bound
!), you'll have one of three cases:
x < lower_bound
x > upper_bound
lower_bound <= x <= upper_bound
.In case 1, the max/min
expression first evaluates to max(lower_bound, x)
, which then resolves to lower_bound
.
In case 2, the expression first becomes max(lower_bound, upper_bound)
, which then becomes upper_bound
.
In case 3, we get max(lower_bound, x)
which resolves to just x
.
In all three cases, the output is what we want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With