Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I access the data frame that has been passed to ggplot()?

I want to set the string N=xxx as the title of my figure, where xxx is the number of observations in the data frame that I pass as the data argument to ggplot(). In my current code, I explicitly pass that data frame a second time as an argument to sprintf() which I use inside of labs():

ggplot(mtcars, aes(mpg, hp)) +      labs(title=sprintf("N=%i", nrow(mtcars))) +      geom_point() 

This does produce the desired title, but it won't work with more complex tasks: I use a dplyr pipe to construct the data frame that is being plotted, and as this is a time-consuming process, I wouldn't want to repeat the pipe a second time to obtain the number of rows like in the example.

So, how do I access the data frame that has been passed as an argument to ggplot() from within the argument specifications of the functions that are used to modify the plot?

like image 423
Schmuddi Avatar asked Jul 13 '17 18:07

Schmuddi


People also ask

What does Ggplot () do?

ggplot() initializes a ggplot object. It can be used to declare the input data frame for a graphic and to specify the set of plot aesthetics intended to be common throughout all subsequent layers unless specifically overridden.

Does Ggplot only work with data frames?

ggplot only works with data frames, so we need to convert this matrix into data frame form, with one measurement in each row. We can convert to this “long” form with the melt function in the library reshape2 . Notice how ggplot2 is able to use either numerical or categorical (factor) data as x and y coordinates.

What is AES () in Ggplot?

aes() is a quoting function. This means that its inputs are quoted to be evaluated in the context of the data. This makes it easy to work with variables from the data frame because you can name those directly. The flip side is that you have to use quasiquotation to program with aes() .

What is Ggplot library in R?

ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and aesthetics of your graphics, and will make you much more efficient in creating them. ggplot2 allows to build almost any type of chart.

How to plot only subset of Dataframe in ggplot?

Here, we use subset () function for plotting only subset of DataFrame inside ggplot () function inplace of data DataFrame. All other things are same. Syntax: subset (obj, …)

How to plot values in ggplot_build ()?

To get values actually plotted you can use function ggplot_build () where argument is your plot. This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count ).

How to access data from a Dataframe in Python?

Access Data From DataFrame In Python Value. We can access the individual value of DataFrame in the following ways. Using the row name and row index number... Adding a Row. We already discuss about the “at” and “loc” attribute for accessing a single value. However, “at” and... Adding a column. It is ...

When to use mutate () and summarize () in ggplot?

While that isn't useful for an un-modified data frame, if you are piping through a series of mutate () 's or summarize () 's before you get to the ggplot, this can be useful after the fact to show the data. Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question.


2 Answers

mtcars %>% {   ggplot(., aes(mpg, hp)) +    labs(title = paste("N =", nrow(.))) +    geom_point() } 

Note that when wrapping the whole ggplot call in {...} curly braces, you must use the . dot pronoun for the data argument in ggplot(., ...). Then you can call back that object using the . pronoun anywhere in the call.

enter image description here

like image 148
Brian Avatar answered Sep 28 '22 01:09

Brian


Another option that takes advantage of another of magrittr's pipe-lining features: the tee operator %T>%.

library(ggplot2) library(magrittr) # to solidify where the variable will be out-of-scope defined nr <- "oops" mtcars %T>%   { nr <<- nrow(.) } %>%   ggplot(aes(mpg, hp)) +      labs(title=sprintf("N=%i", nr)) +    geom_point() 

(This can also be done using dplyr's do({nr <<- nrow(.)}) %>%.)

This differs from Brian's answer in two ways:

  1. Subjectively "cleaner looking", in that the ggplot code is not indented within a code block. (As commented, though, the blending of different pipelines could be a negative as well.)

  2. It has side-effect, by creating nr outside of the pipeline and ggplot pipes. By pre-assigning nr, I think this mitigates reaching outside of the local environment, but it's still a little sloppy.

like image 41
r2evans Avatar answered Sep 28 '22 02:09

r2evans