Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset/filter in dplyr chain with ggplot2

I'd like to make a slopegraph, along the lines (no pun intended) of this. Ideally, I'd like to do it all in a dplyr-style chain, but I hit a snag when I try to subset the data to add specific geom_text labels. Here's a toy example:

# make tbl:

df <- tibble(
  area = rep(c("Health", "Education"), 6),
  sub_area = rep(c("Staff", "Projects", "Activities"), 4),
  year = c(rep(2016, 6), rep(2017, 6)),
  value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)


# plot: 

df %>% filter(area == "Health") %>% 
  ggplot() + 
  geom_line(aes(x = as.factor(year), y = value, 
            group = sub_area, color = sub_area), size = 2) + 
  geom_point(aes(x = as.factor(year), y = value, 
            group = sub_area, color = sub_area), size = 2) +
  theme_minimal(base_size = 18) + 
  geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"), 
  aes(x = as.factor(year), y = value, 
  color = sub_area, label = area), size = 6, hjust = 1)

But this gives me Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) : object '.' not found. Using subset instead of dplyr::filter gives me a similar error. What I've found on SO/Google is this question, which addresses a slightly different problem.

What is the correct way to subset the data in a chain like this?

Edit: My reprex is a simplified example, in the real work I have one long chain. Mike's comment below works for the first case, but not the second.

like image 490
RobertMyles Avatar asked May 16 '17 17:05

RobertMyles


People also ask

Can you subset in Ggplot?

Plotting with ggplot2Faceting can be used to generate the same plot for different subsets of the dataset.

Can you filter within Ggplot?

ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument.

Is subset the same as filter in R?

They are, indeed, producing the same result, and they are very similar in concept. The advantage of subset is that it is part of base R and doesn't require any additional packages. With small sample sizes, it seems to be a bit faster than filter (6 times faster in your example, but that's measured in microseconds).

How do I filter multiple items in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.


2 Answers

If you wrap the plotting code in {...}, you can use . to specify exactly where the previously calculated results are inserted:

library(tidyverse)

df <- tibble(
  area = rep(c("Health", "Education"), 6),
  sub_area = rep(c("Staff", "Projects", "Activities"), 4),
  year = c(rep(2016, 6), rep(2017, 6)),
  value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)

df %>% filter(area == "Health") %>% {
    ggplot(.) +    # add . to specify to insert results here
        geom_line(aes(x = as.factor(year), y = value, 
                      group = sub_area, color = sub_area), size = 2) + 
        geom_point(aes(x = as.factor(year), y = value, 
                       group = sub_area, color = sub_area), size = 2) +
        theme_minimal(base_size = 18) + 
        geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),    # and here
                  aes(x = as.factor(year), y = value, 
                      color = sub_area, label = area), size = 6, hjust = 1)
}

While that plot is probably not what you really want, at least it runs so you can edit it.

What's happening: Normally %>% passes the results of the left-hand side (LHS) to the first parameter of the right-hand side (RHS). However, if you wrap the RHS in braces, %>% will only pass the results in to wherever you explicitly put a .. This formulation is useful for nested sub-pipelines or otherwise complicated calls (like a ggplot chain) that can't otherwise be sorted out just by redirecting with a .. See help('%>%', 'magrittr') for more details and options.

like image 61
alistaire Avatar answered Nov 15 '22 15:11

alistaire


Writing:

geom_text(data = df[df$year == 2016 & df$sub_area == "Activities",],...

instead of

geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),...

makes it work but you still have issues about the position of the text (you should be able to easily find help on SO for that issue).

like image 41
YGS Avatar answered Nov 15 '22 14:11

YGS