Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Univariate outlier detection

This time I won't be asking a direct question on how to detect outliers as I did before in one of my questions. I did read some posts related to this topic but didn't get what I needed. I have a set of values which are given below:

y<-c(0.59, 0.61, 0.59, 1.55, 1.33, 3.50, 1.00, 1.22, 2.50, 3.00, 3.79, 3.98, 4.33, 4.45, 4.59, 4.72, 4.82, 4.90, 4.96, 7.92, 5.01, 5.01, 4.94, 5.05, 5.04, 5.03, 5.06, 5.10, 5.04, 5.06, 7.77, 5.07, 5.08, 5.08, 5.12, 5.12, 5.08, 5.17, 5.18)

Now as most of the researchers say that the outlier detection process not only depends on the data but also on the context. I have used several packages from R like outliers (grubbs test), extremevalues, mvoutlier(pcout method) but couldn't find out the best way to use them. Here in this case (depending on my requirements), 7.77 (obs no 31), 7.92 (obs on 20), and 3.50 (obs no 6) are outliers. Using outliers package's grubbs test I can detect 7.77 and 7.92 as outliers but not 3.50. I don't know whether I can post my plot of data here or not but after viewing the trend of the data on the plot or the distribution, observation No 6 would be obvious as an outlier.

I am trying to fit a non linear model to this data but because of these outliers, I couldn't find the best fit (best fit is not the only requirement) and anyway I need to detect these outliers as I will be fitting a separate model on these outliers.

My question is very simple. Is it possible that I can some how detect these 3 outliers with some standard package OR how can I use my non linear generated model to aid in detecting these outliers?

Best regards

Shahzad

enter image description here

like image 240
Shahzad Avatar asked Nov 11 '12 00:11

Shahzad


1 Answers

Just to say that I tried using detectAO() as suggested above and it didn't find anything with my data (which looked somewhat similar: short spikes coming off a continuous trend). After googling around, I found that the Hempel filter (function hempel() from package pracma) could do what I needed. I thought I'd add this here in case someone else is looking for a solution.

like image 79
msp Avatar answered Oct 28 '22 04:10

msp