Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I identify the labels of outliers in a R boxplot?

Tags:

plot

r

outliers

The R boxplot function is a very useful way to look at data: it quickly provides you with a visual summary of the approximate location and variance of your data, and the number of outliers. In addition, I'd like to identify the outliers, in order to quickly find problems in the dataset.

The values of these outliers can be accessed using myplot$out. Unfortunately, the labels of these outliers seem to be unavailable. There are some packages aimed at displaying the labels on the plot itself: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/, but they don't work well and I just want to list these outliers, I don't need them to be on the plot itself.

Any ideas?

like image 855
static_rtti Avatar asked Jun 21 '12 08:06

static_rtti


2 Answers

You've done most of the hard work yourself. All that is remaining is a comparison:

##First create some data 
##You should include this in your question)
set.seed(2)
dd = data.frame(x = rlnorm(26), y=LETTERS)

Grab the outliers

outliers = boxplot(dd$x, plot=FALSE)$out

Extract the outliers from the original data frame

dd[dd$x %in% outliers,]

Further explanation:

The variable dd$x is the vector of 26 numbers. The variable outliers contains the values of the outliers (just type dd$x and outliers in your R console). The command

dd$x %in% outliers

matches the values of dd$x and outliers, viz:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE <snip>

The square bracket notation, dd[dd$x %in% outliers,] returns the rows of the data frame dd, where dd$x %in% outliers return TRUE.

like image 88
csgillespie Avatar answered Sep 23 '22 04:09

csgillespie


I suggest which(x < myplot$stats[1] | x > myplot$stats[5]) where x is your data.

like image 35
danas.zuokas Avatar answered Sep 24 '22 04:09

danas.zuokas