Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: Subset a layer where data is passed using a pipe

Tags:

r

dplyr

ggplot2

I am trying to subset a layer of a plot where I am passing the data to ggplot through a pipe.

Here is an example:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(aes(x = Month, y = `Relative Value`))

This gives:

enter image description here

I would like only one of those lines to appear, which I would be able to do if something like this worked in the geom_line layer:

  geom_line(subset = .(Indicator == 1), aes(x = Month, y = `Relative Value`))

Edit:

Session info:

R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 x64 (build 9200)

locale: 2 LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: 2 stats graphics grDevices utils
datasets methods base

other attached packages: 2 scales_0.3.0 lubridate_1.3.3 ggplot2_1.0.1 lazyeval_0.1.10 dplyr_0.4.3 RSQLite_1.0.0
readr_0.2.2 [8] RJDBC_0.2-5 DBI_0.3.1 rJava_0.9-7

loaded via a namespace (and not attached): 2 Rcpp_0.12.2
knitr_1.11 magrittr_1.5 MASS_7.3-40 munsell_0.4.2
lattice_0.20-31 [7] colorspace_1.2-6 R6_2.1.1 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 parallel_3.2.1 [13] grid_3.2.1
gtable_0.1.2 htmltools_0.2.6 yaml_2.1.13 assertthat_0.1
digest_0.6.8 [19] reshape2_1.4.1 memoise_0.2.1
rmarkdown_0.8.1 labeling_0.3 stringi_1.0-1 zoo_1.7-12
[25] proto_0.3-10

like image 749
tchakravarty Avatar asked Dec 26 '15 06:12

tchakravarty


1 Answers

tl;dr: Pass the data to that layer as a function that subsets the plot's data according to your criteria.


According to ggplots documentation on layers, you have 3 options when passing the data to a new layer:

  1. If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().
  2. A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.
  3. A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.

The first two options are the most usual ones, but the 3rd is perfect for our needs when the data has been modified through pyps.

In your example, adding data = function(x) subset(x,Indicator == 1) to the geom_line does the trick:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(data = function(x) subset(x,Indicator == 1), aes(x = Month, y = `Relative Value`))

This is the resulting plot

like image 50
mgiormenti Avatar answered Oct 31 '22 13:10

mgiormenti