I'm successfully using the boxplot
function to generate... boxplots. Now I need to generate tables containing the stats that boxplot
calculates in order to create plots.
I do this by using the plot=FALSE
option.
The problem is that this produces data in a rather bizarre format that I simply can't do anything with. Here's an example:
structure(list(stats = structure(c(178.998262143545, 182.227431564442,
202.108456373209, 220.375358994654, 221.990406228232, 216.59986775699,
217.054997032148, 228.509462713206, 267.070720949859, 284.832378859975,
189.864120937198, 201.876421960518, 219.525439081472, 234.260088973545,
279.343359793024, 209.472617639903, 209.526516071858, 214.785213079737,
230.027361556731, 240.0647114578, 202.057148813419, 207.375619207685,
220.093663781351, 226.246698737471, 240.343646265795), .Dim = c(5L,
5L)), n = c(4, 6, 8, 4, 8), conf = structure(c(171.971593703341,
232.245319043076, 196.247705331772, 260.771220094641, 201.435457751239,
237.615420411705, 198.589545146688, 230.980881012787, 209.552007821332,
230.635319741371), .Dim = c(2L, 5L)), out = numeric(0), group = numeric(0),
names = c("U", "UM", "M", "LM", "L")), .Names = c("stats", "n", "conf", "out", "group",
"names"))
What I want is a table for each of the stats -- min, max, median and the quartiles -- and their values for each group (the ones in "names").
Could somebody give me a hand with this? I'm very much an R beginner.
Thanks in advance!
When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 - 1.5 * IQR or Q3 + 1.5 * IQR).
boxplot . Some of the frequently used ones are, main -to give the title, xlab and ylab -to provide labels for the axes, col to define color etc. Additionally, with the argument horizontal = TRUE we can plot it horizontally and with notch = TRUE we can add a notch to the box.
How to Read a Box Plot. A boxplot is a way to show a five number summary in a chart. The main part of the chart (the “box”) shows where the middle portion of the data is: the interquartile range. At the ends of the box, you” find the first quartile (the 25% mark) and the third quartile (the 75% mark).
How to interpret the box plot? The bottom of the (green) box is the 25% percentile and the top is the 75% percentile value of the data. So, essentially the box represents the middle 50% of all the datapoints which represents the core region when the data is situated.
boxplot
returns a structure in R called a list
.
A list is more-or-less a data container where you can refer to elements by name.
If you do A <- boxplot(...)
, you can access the names
with A$names
, the conf
with A$conf
, etc.
So, looking at the boxplot
helpfile ?boxplot
under Value:
(which tells you what boxplot
returns), we see that it returns a list with the following components:
stats: a matrix, each column contains the extreme of the lower
whisker, the lower hinge, the median, the upper hinge and the
extreme of the upper whisker for one group/plot. If all the
inputs have the same class attribute, so will this component.
n: a vector with the number of observations in each group.
conf: a matrix where each column contains the lower and upper
extremes of the notch.
out: the values of any data points which lie beyond the extremes
of the whiskers.
group: a vector of the same length as ‘out’ whose elements indicate
to which group the outlier belongs.
names: a vector of names for the groups.
So the table for each of the stats is in A$stats
, each column belongs to a group and contains the min, lower quartile, median, upper quartile, and max.
You could do:
A <- boxplot(...)
mytable <- A$stats
colnames(mytable)<-A$names
rownames(mytable)<-c('min','lower quartile','median','upper quartile','max')
mytable
which returns (for mytable
):
U UM M LM L
min 178.9983 216.5999 189.8641 209.4726 202.0571
lower quartile 182.2274 217.0550 201.8764 209.5265 207.3756
median 202.1085 228.5095 219.5254 214.7852 220.0937
upper quartile 220.3754 267.0707 234.2601 230.0274 226.2467
max 221.9904 284.8324 279.3434 240.0647 240.3436
Then you can refer to it like mytable['min','U']
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With