Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing extreme values in a vector in Matlab?

Tags:

matlab

So say, I have a = [2 7 4 9 2 4 999]

And I'd like to remove 999 from the matrix (which is an obvious outlier).

Is there a general way to remove values like this? I have a set of vectors and not all of them have extreme values like that. prctile(a,99.5) is going to output the largest number in the vector no matter how extreme (or non-extreme) it is.

like image 304
InquilineKea Avatar asked Dec 08 '22 17:12

InquilineKea


2 Answers

There are several way to do that, but first you must define what is "extreme'? Is it above some threshold? above some number of standard deviations? Or, if you know you have exactly n of these extreme events and that their values are larger than the rest, you can use sort and the delete the last n elements. etc...

For example a(a>threshold)=[] will take care of a threshold like definition, while a(a>mean(a)+n*std(a))=[] will take care of discarding values that are n standard deviation above the mean of a.

A completely different approach is to use the median of a, if the vector is as short as you mention, you want to look on a median value and then you can either threshold anything above some factor of that value a(a>n*median(a))=[] .

Last, a way to assess an approach to treat these spikes would be to take a histogram of the data, and work from there...

like image 98
bla Avatar answered Dec 28 '22 20:12

bla


I can think of two:

  • Sort your matrix and remove n-elements from top and bottom.
  • Compute the mean and the standard deviation and discard all values that fall outside: mean +/- (n * standard deviation)

In both cases n must be chosen by the user.

like image 25
sfotiadis Avatar answered Dec 28 '22 19:12

sfotiadis