Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boxplot using summary instead of raw data

Tags:

r

ggplot2

I am still new to ggplot2. I want to plot a box plot but instead of the raw data I have the summary points.

Page_Type   ID  Count   min 5%  25% 50% 75% 95% Max Avg
3   24559   173 408 479.45  615.25  800.5   1547.25 4436.8  7068    1350.138462
3   24560   101 0   480 631 871 1762    5183    65177   2702.245902
6   24559   69  490 664 1181    1807    3221    4845.5  6397    2287.45098
6   24560   10  1086    1254.4  1928    1970    2007    5236.6  6044    2607
46  24559   49  217 252.45  438.75  595 1198    2647.15 4316    939.6666667
46  24560   31  266 337 467 640 1123    2531.6  5232    989.2758621
69  24559   424 644 761.8   957 1292    2212    4938.6  11246   1881.785467
69  24560   216 601 848.85  1060.25 1488.5  2465    5314.7  7981    2094.007692
82  24559   62  922 1018.2  1305    1534    1966    3313.8  22461   2325.810811
82  24560   137 630 926.6   1156    1468    2281    3764.6  11364   1922.252632

the dput output is as follows:

structure(list(Page_Type = c(3L, 3L, 6L, 6L, 46L, 46L, 69L, 69L, 
82L, 82L), ID = c(24559L, 24560L, 24559L, 24560L, 24559L, 24560L, 
24559L, 24560L, 24559L, 24560L), Count = c(173L, 101L, 69L, 10L, 
49L, 31L, 424L, 216L, 62L, 137L), min = c(408L, 0L, 490L, 1086L, 
217L, 266L, 644L, 601L, 922L, 630L), X5. = c(479.45, 480, 664, 
1254.4, 252.45, 337, 761.8, 848.85, 1018.2, 926.6), X25. = c(615.25, 
631, 1181, 1928, 438.75, 467, 957, 1060.25, 1305, 1156), X50. = c(800.5, 
871, 1807, 1970, 595, 640, 1292, 1488.5, 1534, 1468), X75. = c(1547.25, 
1762, 3221, 2007, 1198, 1123, 2212, 2465, 1966, 2281), X95. = c(4436.8, 
5183, 4845.5, 5236.6, 2647.15, 2531.6, 4938.6, 5314.7, 3313.8, 
3764.6), Max = c(7068L, 65177L, 6397L, 6044L, 4316L, 5232L, 11246L, 
7981L, 22461L, 11364L), Avg = c(1350.138462, 2702.245902, 2287.45098, 
2607, 939.6666667, 989.2758621, 1881.785467, 2094.007692, 2325.810811, 
1922.252632)), .Names = c("Page_Type", "ID", "Count", "min", 
"X5.", "X25.", "X50.", "X75.", "X95.", "Max", "Avg"), class = "data.frame", row.names = c(NA, 
-10L))

There are 5 page types and each page type has 2 ids. I want to show the various summary metrics (min, 5%, 25% ...) as a box plot. I am ok with skiping the 5% and 95% data points to fit the more traditional look. How do I create a box plot from this data?

There is also a count column which shows how many point were used to get the summary. If this can be overlayed on the same plot great else it can be a different plot as well.

like image 796
Rohit Das Avatar asked Feb 15 '23 13:02

Rohit Das


1 Answers

You can make boxplot with geom_boxplot() by providing your own min, max, middle, upper and lower values, only in this case you should add stat="identity" inside geom_boxplot().

ggplot(df,aes(x=as.factor(Page_Type),
       ymin=min,lower=X5.,middle=X50.,upper=X75.,ymax=Max,fill=as.factor(ID)))+
  geom_boxplot(stat="identity")  

enter image description here

like image 95
Didzis Elferts Avatar answered Feb 18 '23 11:02

Didzis Elferts