Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot 95 percentile and 5 percentile on ggplot2 plot with already calculated values?

I have this dataset and use this R code:

library(reshape2)
library(ggplot2)
library(RGraphics)
library(gridExtra)

long <- read.csv("long.csv")
ix <- 1:14

ggp2 <- ggplot(long, aes(x = id, y = value, fill = type)) +
    geom_bar(stat = "identity", position = "dodge") +
    geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3, angle = 0) +
    scale_x_continuous("Nodes", breaks = ix) +
    scale_y_continuous("Throughput (Mbps)", limits = c(0,1060)) +
    scale_fill_discrete(name="Legend",
                        labels=c("Inside Firewall (Dest)",
                                 "Inside Firewall (Source)",
                                 "Outside Firewall (Dest)",
                                 "Outside Firewall (Source)")) +
    theme_bw() +
    theme(legend.position="right") +
    theme(legend.title = element_text(colour="black", size=14, face="bold")) +
    theme(legend.text = element_text(colour="black", size=12, face="bold")) +
    facet_grid(type ~ .) +
plot(ggp2)

to get the following result: enter image description here

Now I need to add the 95 percentile and 5 percentile to the plot. The numbers are calculated in this dataset (NFPnumbers (95 percentile) and FPnumbers (5 percentile) columns).

It seems boxplot() may work here but I am not sure how to use it with ggplot. stat_quantile(quantiles = c(0.05,0.95)) could work as well, but the function calculates the numbers itself. Can I use my numbers here?

I also tried:

geom_line(aes(x = id, y = long$FPnumbers)) +
geom_line(aes(x = id, y = long$NFPnumbers))

but the result did not look good enough.

geom_boxplot() did not work as well:

geom_boxplot(aes(x = id, y = long$FPnumbers)) +
geom_boxplot(aes(x = id, y = long$NFPnumbers))
like image 595
Rlearner Avatar asked Oct 02 '22 03:10

Rlearner


1 Answers

When you want to set the parameters for a boxplot, you also need ymin and ymax values. As they are not in the dataset, I calculated them.

ggplot(long, aes(x = factor(id), y = value, fill = type)) +
  geom_boxplot(aes(lower = FPnumbers, middle = value, upper = NFPnumbers, ymin = FPnumbers*0.5, ymax = NFPnumbers*1.2, fill = type), stat = "identity") +
  xlab("Nodes") +
  ylab("Throughput (Mbps)") +
  scale_fill_discrete(name="Legend",
                      labels=c("Inside Firewall (Dest)", "Inside Firewall (Source)",
                               "Outside Firewall (Dest)", "Outside Firewall (Source)")) +
  theme_bw() +
  theme(legend.position="right",
        legend.title = element_text(colour="black", size=14, face="bold"),
        legend.text = element_text(colour="black", size=12, face="bold")) +
  facet_grid(type ~ .)

The result:

enter image description here


In the dataset you provided, you gave the value, FPnumbers & NFPnumbers variables. As FPnumbers & NFPnumbers represent the 5 and 95 percentiles, I suppose that the mean is represented by value. For this solution to work, you'll need min and max values for each "Node". I guess you have them somewhere in your raw data.

However, as they are not provided in the dataset, I made them up by calculating them based on FPnumbers & NFPnumbers. The multiplication factors of 0.5 and 1.2 are arbitrary. It is just a way of creating fictitious min and max values.

like image 134
Jaap Avatar answered Oct 03 '22 18:10

Jaap