I have sales statistic data in array form to calc standard deviation or average from this data.
stats = [100, 98, 102, 100, 108, 23, 120]
let said +-20% differential is normal situation, 23 is obviously a special case.
what's the best algorithm (in any language, pseudo or any principle) to find this unusual value?
You could convert them to Z-scores and look for outliers.
>>> import numpy as np
>>> stats = [100, 98, 102, 100, 108, 23, 120]
>>> mean = np.mean(stats)
>>> std = np.std(stats)
>>> stats_z = [(s - mean)/std for s in stats]
>>> np.abs(stats_z) > 2
array([False, False, False, False, False, True, False], dtype=bool)
Compute the average and standard deviation. Treat any value more than X standard deviations from the average as "unusual" (where X will probably be somewhere around 2.5 to 3.0 or so).
There are quite a few variations of this theme. If you need something that's really statistically sound, you might want to look into some of them -- they can eliminate things like defending the arbitrary choice of (say) 2.7 standard deviations as the dividing line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With