Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Documentation on internal variables in ggplot, esp. PANEL

Tags:

r

ggplot2

The answer to this question uses a PANEL variable which seems to be internal to ggplot. But searching the ggplot documentation and also Hadley Wickham's book, I can find no reference to it at all. Is this documented anywhere?

Also, looking at the code for stat_bin(...), there is evidently a vector count created (which contains the count of y for each unique x??). This is also accessible in aes(...) but, again, I can find no documentation.

So my question is: is there a place where all of these internal variables are documented, or must one just go to the code?

like image 956
jlhoward Avatar asked Dec 16 '13 22:12

jlhoward


2 Answers

There are some surprising gaps in the help pages for ggplot2 (and I would point also to the help page for ?layer to which many other pages refer users as a particularly egregious gap.) These "variables" have been around for years and like you I cannot find much in the online help or the package NEWS. SO's search facility is not much help because it strips off the leading and trailing dots and shows everything with "count". Only examples of their use can be found in cran.r-project.org/web/packages/ggplot2/ggplot2.pdf. Google is somewhat more helpful and the search string of: ggplot2 ..counts.. delivers many informative hits. From context one forms that sense that these are not so much special variables as much as they are combined functions and program controls. These arguments implicitly transform the named arguments. They do seem to be implicitly mentioned in ?stat_bin {ggplot2} albeit without the dots, and it appears that all four of these stat-variable-functions are calculated at the same time.

When I did a search in the pdf you linked to I found on pages 57-58 tables (#4.3,4.4) of "statistics" and "aesthetics" that you were asking for, but to my surprise it did not include count. Those tables are in section 4.7 that describes "stats".

(I have noticed improvement or the last couple of years in some of the pages to which these complaints were directed.)

like image 121
IRTFM Avatar answered Nov 12 '22 13:11

IRTFM


I think PANEL is a column in the component data of a plot. You get list of columns names:

names(ggplot_build(x)$data)

For the count and frequency variables, you can refer to Hadley book , page 69:

Both the histogram and frequency polygon geom use stat_bin. This statistic produces two output variables count and density. The count is the default as it is most interpretable. The density is basically the count divided by the total count, and is useful when you want to compare the shape of the distributions, not the overall size. You will often prefer this when comparing the distribution of subsets that have different sizes.

like image 37
agstudy Avatar answered Nov 12 '22 14:11

agstudy