> summary(mydata)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0      93     107     110     125     197 
> range=1.5*(125-93)
> upper_whisker=125+range
> lower_whisker=93-range
> upper_whisker
[1] 173
> lower_whisker
[1] 45
> boxplot(mydata)$stats
  [,1]
[1,]   56  #Lower whisker by boxplot
[2,]   93   
[3,]  107
[4,]  125
[5,]  173
I tried looking up the formula for calculating after and before what values are the points to be considered outliers It was
 Above =>3rd Qu +(3rd Qu - 1st Qu)*1.5
 Below =>1st Qu -(3rd Qu - 1st Qu)*1.5
For some reason they don't seem to match with the stats returned by boxplot function in R I have a feeling it's something silly here
Are they calculated differently? Or am I reading the wrong answer from boxplot?
Edit:
I've used https://www.kaggle.com/uciml/pima-indians-diabetes-database and ran
mydata=raw$Glucose[raw$Outcome==0]
EDIT2:
I suppose if
#max(min(x), Q1 - (IQR(x)*1.5)) #lower whisker
is returning min(x), there shouldn't be any outliers and min(mydata) is 0
Edit 3: Clearer view of Quantile
quantile(mydata)
0%  25%  50%  75% 100% 
0   93  107  125  197 
Edit 4: Added vector as requested
c(85L, 89L, 116L, 115L, 110L, 139L, 103L, 126L, 99L, 97L, 145L, 
117L, 109L, 88L, 92L, 122L, 103L, 138L, 180L, 133L, 106L, 159L, 
146L, 71L, 105L, 103L, 101L, 88L, 150L, 73L, 100L, 146L, 105L, 
84L, 44L, 141L, 99L, 109L, 95L, 146L, 139L, 129L, 79L, 0L, 62L, 
95L, 112L, 113L, 74L, 83L, 101L, 110L, 106L, 100L, 107L, 80L, 
123L, 81L, 142L, 144L, 92L, 71L, 93L, 151L, 125L, 81L, 85L, 126L, 
96L, 144L, 83L, 89L, 76L, 78L, 97L, 99L, 111L, 107L, 132L, 120L, 
118L, 84L, 96L, 125L, 100L, 93L, 129L, 105L, 128L, 106L, 108L, 
154L, 102L, 57L, 106L, 147L, 90L, 136L, 114L, 153L, 99L, 109L, 
88L, 151L, 102L, 114L, 100L, 148L, 120L, 110L, 111L, 87L, 79L, 
75L, 85L, 143L, 87L, 119L, 0L, 73L, 141L, 111L, 123L, 85L, 105L, 
113L, 138L, 108L, 99L, 103L, 111L, 96L, 81L, 147L, 179L, 125L, 
119L, 142L, 100L, 87L, 101L, 197L, 117L, 79L, 122L, 74L, 104L, 
91L, 91L, 146L, 122L, 165L, 124L, 111L, 106L, 129L, 90L, 86L, 
111L, 114L, 193L, 191L, 95L, 142L, 96L, 128L, 102L, 108L, 122L, 
71L, 106L, 100L, 104L, 114L, 108L, 129L, 133L, 136L, 155L, 96L, 
108L, 78L, 161L, 151L, 126L, 112L, 77L, 150L, 120L, 137L, 80L, 
106L, 113L, 112L, 99L, 115L, 129L, 112L, 157L, 179L, 105L, 118L, 
87L, 106L, 95L, 165L, 117L, 130L, 95L, 0L, 122L, 95L, 126L, 139L, 
116L, 99L, 92L, 137L, 61L, 90L, 90L, 88L, 158L, 103L, 147L, 99L, 
101L, 81L, 118L, 84L, 105L, 122L, 98L, 87L, 93L, 107L, 105L, 
109L, 90L, 125L, 119L, 100L, 100L, 131L, 116L, 127L, 96L, 82L, 
137L, 72L, 123L, 101L, 102L, 112L, 143L, 143L, 97L, 83L, 119L, 
94L, 102L, 115L, 94L, 135L, 99L, 89L, 80L, 139L, 90L, 140L, 147L, 
97L, 107L, 83L, 117L, 100L, 95L, 120L, 82L, 91L, 119L, 100L, 
135L, 86L, 134L, 120L, 71L, 74L, 88L, 115L, 124L, 74L, 97L, 154L, 
144L, 137L, 119L, 136L, 114L, 137L, 114L, 126L, 132L, 123L, 85L, 
84L, 139L, 173L, 99L, 194L, 83L, 89L, 99L, 80L, 166L, 110L, 81L, 
154L, 117L, 84L, 94L, 96L, 75L, 130L, 84L, 120L, 139L, 91L, 91L, 
99L, 125L, 76L, 129L, 68L, 124L, 114L, 125L, 87L, 97L, 116L, 
117L, 111L, 122L, 107L, 86L, 91L, 77L, 105L, 57L, 127L, 84L, 
88L, 131L, 164L, 189L, 116L, 84L, 114L, 88L, 84L, 124L, 97L, 
110L, 103L, 85L, 87L, 99L, 91L, 95L, 99L, 92L, 154L, 78L, 130L, 
111L, 98L, 143L, 119L, 108L, 133L, 109L, 121L, 100L, 93L, 103L, 
73L, 112L, 82L, 123L, 67L, 89L, 109L, 108L, 96L, 124L, 124L, 
92L, 152L, 111L, 106L, 105L, 106L, 117L, 68L, 112L, 92L, 183L, 
94L, 108L, 90L, 125L, 132L, 128L, 94L, 102L, 111L, 128L, 92L, 
104L, 94L, 100L, 102L, 128L, 90L, 103L, 157L, 107L, 91L, 117L, 
123L, 120L, 106L, 101L, 120L, 127L, 162L, 112L, 98L, 154L, 165L, 
99L, 68L, 123L, 91L, 93L, 101L, 56L, 95L, 136L, 129L, 130L, 107L, 
140L, 107L, 121L, 90L, 99L, 127L, 118L, 122L, 129L, 110L, 80L, 
127L, 158L, 126L, 134L, 102L, 94L, 108L, 83L, 114L, 117L, 111L, 
112L, 116L, 141L, 175L, 92L, 106L, 105L, 95L, 126L, 65L, 99L, 
102L, 109L, 153L, 100L, 81L, 121L, 108L, 137L, 106L, 88L, 89L, 
101L, 122L, 121L, 93L)
Your calculation was almost right, R uses this:
#max(min(x), Q1 - (IQR(x)*1.5)) #lower whisker
#min(max(x), Q3 + (IQR(x)*1.5)) #upper whisker
That's why, it picks the max/min between the min(x)/max(x), and the standard formula.
Here an example:
my_data <- mtcars$mpg
bp <- boxplot(my_data)
bp$stats
# [1,] 10.40 # lower whisker
# [2,] 15.35
# [3,] 19.20 # == median(my_data)
# [4,] 22.80
# [5,] 33.90 # upper whisker
max(min(my_data,na.rm=T), as.numeric(quantile(my_data, 0.25)) - (IQR(my_data)*1.5))
#[1] 10.4 #lower whisker
min(max(my_data,na.rm=T), as.numeric(quantile(my_data, 0.75)) + (IQR(my_data)*1.5))
#[1] 33.9 # upper whisker
I think there are few things to clarified. The first thing is that you should always provide a reproducible example for helping people to help you. An outlier is defined as a data point that is located outside the whiskers of the boxplot (e.g: outside 1.5 times the interquartile range above the upper quartile and bellow the lower quartile). The correct way to figure out how this work is simulating some Student's T data under a pre-specified a random number generator state.
set.seed(1)
mydata <- rt(100, df = 3)
boxplot(mydata)
summary(mydata)

Then we can calculate the interquartile range and the lower and upper bounds for outliers according to the rule in the text above
t <- as.vector(summary(mydata))
iqr.range <- t[5]-t[2]
upper_outliers <- t[5]+iqr.range*1.5
lower_outliers <- t[2]-iqr.range*1.5
Let's check the data which are defined as outliers, while the boxplot whiskers are the data points immediately before/after the lower/upper boundaries.
 mydata[mydata<lower_outliers]
 [1] -3.527006 -2.959327 -2.754192
 mydata[mydata>upper_outliers]
 [1] 3.080302 3.527205
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With