So say, I have a = [2 7 4 9 2 4 999]
And I'd like to remove 999 from the matrix (which is an obvious outlier).
Is there a general way to remove values like this? I have a set of vectors and not all of them have extreme values like that. prctile(a,99.5) is going to output the largest number in the vector no matter how extreme (or non-extreme) it is.
There are several way to do that, but first you must define what is "extreme'? Is it above some threshold? above some number of standard deviations?
Or, if you know you have exactly n
of these extreme events and that their values are larger than the rest, you can use sort
and the delete the last n
elements. etc...
For example a(a>threshold)=[]
will take care of a threshold like definition, while a(a>mean(a)+n*std(a))=[]
will take care of discarding values that are n
standard deviation above the mean of a
.
A completely different approach is to use the median of a
, if the vector is as short as you mention, you want to look on a median value and then you can either threshold anything above some factor of that value a(a>n*median(a))=[]
.
Last, a way to assess an approach to treat these spikes would be to take a histogram of the data, and work from there...
I can think of two:
mean +/- (n * standard deviation)
In both cases n must be chosen by the user.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With