Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract data from a ggplot

I have made a plot using ggplot2 geom_histogram from a data frame. See sample below and link to the ggplot histogram Need to label each geom_vline with the factors using a nested ddply function and facet wrap

I now need to make a data frame that contains the summarized data used to generate the ggplot above.

Sector2 Family  Year    Length BUN Acroporidae 2010    332.1300496 BUN Poritidae   2011    141.1467966 BUN Acroporidae 2012    127.479 BUN Acroporidae 2013    142.5940556 MUR Faviidae    2010    304.0405 MUR Faviidae    2011    423.152 MUR Pocilloporidae  2012    576.0295 MUR Poritidae   2013    123.8936667 NTH Faviidae    2010    60.494 NTH Faviidae    2011    27.427 NTH Pocilloporidae  2012    270.475 NTH Poritidae   2013    363.4635 
like image 696
George Avatar asked Aug 19 '14 07:08

George


People also ask

How to get values from ggplot?

To get values actually plotted you can use function ggplot_build() where argument is your plot. This will make list and one of sublists is named data . This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count ).

What does Ggplot () do?

ggplot() initializes a ggplot object. It can be used to declare the input data frame for a graphic and to specify the set of plot aesthetics intended to be common throughout all subsequent layers unless specifically overridden.

What is Ggplot_build?

Description. ggplot_build() takes the plot object, and performs all steps necessary to produce an object that can be rendered. This function outputs two pieces: a list of data frames (one for each layer), and a panel object, which contain all information about axis limits, breaks etc.

Can you filter in Ggplot?

ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.


2 Answers

To get values actually plotted you can use function ggplot_build() where argument is your plot.

p <- ggplot(mtcars,aes(mpg))+geom_histogram()+       facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))  pg <- ggplot_build(p) 

This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count). If you use facets then column PANEL shows in which facet values are used. If there are more than one geom_ in your plot then data will contains dataframes for each - in my example there is one dataframe for histogramm and another for vlines.

head(pg$data[[1]])   y count         x ndensity ncount density PANEL group ymin ymax 1 0     0  9.791667        0      0       0     1     1    0    0 2 0     0 10.575000        0      0       0     1     1    0    0 3 0     0 11.358333        0      0       0     1     1    0    0 4 0     0 12.141667        0      0       0     1     1    0    0 5 0     0 12.925000        0      0       0     1     1    0    0 6 0     0 13.708333        0      0       0     1     1    0    0       xmin     xmax 1  9.40000 10.18333 2 10.18333 10.96667 3 10.96667 11.75000 4 11.75000 12.53333 5 12.53333 13.31667 6 13.31667 14.10000  head(pg$data[[2]])   xintercept PANEL group xend  x 1         20     1     1   20 20 2         30     1     1   30 30 3         20     2     2   20 20 4         30     2     2   30 30 5         20     3     3   20 20 6         30     3     3   30 30 
like image 93
Didzis Elferts Avatar answered Oct 07 '22 17:10

Didzis Elferts


layer_data is designed precisely for this :

layer_data(p, 1) 

It will give you the data of the first layer, same as ggplot_build(p)$data[[1]].

Its source code is indeed precisely:

function (plot, i = 1L) ggplot_build(plot)$data[[i]] 
like image 25
Moody_Mudskipper Avatar answered Oct 07 '22 17:10

Moody_Mudskipper