I want to detect and store outliers from a list and this is what I am doing
Code:
def outliers(y,thresh=3.5):
m = np.median(y)
abs_dev = np.abs(y - m)
left_mad = np.median(abs_dev[y <= m])
right_mad = np.median(abs_dev[y >= m])
y_mad = left_mad * np.ones(len(y))
y_mad[y > m] = right_mad
modified_z_score = 0.6745 * abs_dev / y_mad
modified_z_score[y == m] = 0
return modified_z_score > thresh
bids = [5000,5500,4500,1000,15000,5200,4900]
z = outliers(bids)
bidd = np.array(bids)
out_liers = bidd[z]
This gives results as:
out_liers = array([ 1000, 15000])
Is there a better way to do this, where I don't get the results in array but in a list? Also please can someone explain me why we used
thresh=3.5
modified_z_score = 0.6745 * abs_dev / y_mad
This works:
def outliers_modified_z_score(ys, threshold=3.5):
ys_arr = np.array(ys)
median_y = np.median(ys_arr)
median_absolute_deviation_y = np.median(np.abs(ys_arr - median_y))
modified_z_scores = 0.6745 * (ys_arr - median_y) / median_absolute_deviation_y
return (ys_arr[np.abs(modified_z_scores) > threshold]).tolist()
That's because you are using numpy
function. Default type used there is numpy.ndarray
, which speeds up the computations. In the case you just need a list as output argument, use tolist()
method.
z = outliers(bids)
bidd = np.array(bids)
out_liers = bidd[z].tolist()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With