Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create dplyr statements to later be evaluated in R

Tags:

r

dplyr

rlang

I want to create a single function called eval_data where the user can input

  1. a list of data frames
  2. a list of dplyr functions to apply to the data frames
  3. a list of columns to select from each dataframe:

This will look something like:

eval_data <- function(data, dplyr_logic, select_vector) {
  data %>%
    # this doesn't work
    eval(dplyr_logic) %>%
    select(
      { select_vector }
    )
}

The dplyr_logic is a list of either:

  1. nothing
  2. a mutate statement
  3. 2 mutate statements
  4. a filter

Input 1: List of data frames:

dd <- list()
dd$data <- list(
  mutate0 = iris,
  mutate1 = iris,
  mutate2= iris,
  filter1 = iris
)

Input 3 Select vector:

select_vec <- list(
  c("Species", "Sepal.Length"),
  c("Species", "New_Column1"),
  c("Species", "New_Column2", "New_Column3"),
  c("Species", "Sepal.Width")
)

Input 2: list of logic to apply to each data frame in the list

logic <- list(
  # do nothing -- this one works
  I(),
  #mutate1
  rlang::expr(mutate(New_Column1 = case_when(
    Sepal.Length > 7 ~'Big',
    Sepal.Length > 6 ~ 'Medium',
    TRUE ~ 'Small'
    )
  )),
  #mutate2
  rlang::expr(mutate(New_Column2 = case_when(
    Sepal.Width > 3.5 ~'Big2',
    Sepal.Width > 3 ~ 'Medium2',
    TRUE ~ 'Small2'
  )) %>%
    mutate(
      New_Column3 = case_when(
        Petal.Width > 2 ~'Big3',
        Petal.Width > 1 ~ 'Medium3',
        TRUE ~ 'Small3'
      )
    )
  ),
  #filter1
  rlang::expr(filter(Sepal.Width > 3))
)

# eval_data(dd$data[[1]], logic[[1]], select_vec[[1]]) works
# eval_data(dd$data[[2]], logic[[2]], select_vec[[2]]) does not

Desired Goal:

pmap(dd$data, logic, select_vec, ~eval_data)

Desired Output

pmap_output <- list(
  iris1 = iris %>% I() %>% select("Species", "Sepal.Length"),

  iris2 = iris %>% 
    mutate(New_Column1 = 
             case_when(
               Sepal.Length > 7 ~'Big',
               Sepal.Length > 6 ~ 'Medium',
               TRUE ~ 'Small')) %>% 
    select("Species", "New_Column1"),

  iris4 = iris %>% 
    mutate(New_Column2 = case_when(
      Sepal.Width > 3.5 ~'Big2',
      Sepal.Width > 3 ~ 'Medium2',
      TRUE ~ 'Small2'
    )) %>%
    mutate(
      New_Column3 = case_when(
        Petal.Width > 2 ~'Big3',
        Petal.Width > 1 ~ 'Medium3',
        TRUE ~ 'Small3'
      )
    ) %>%
    select("Species", "New_Column2", "New_Column3"),

  iris3 = iris %>% filter(Sepal.Width > 3) %>% select("Species", "Sepal.Width")
)

What do I need to change in eval_data and the logic list in order to make this work? Any help appreciated!!

like image 756
MayaGans Avatar asked Nov 30 '25 03:11

MayaGans


1 Answers

Two changes. First, you need to include data %>% into your dplyr logic evaluation:

eval_data <- function(data, dplyr_logic, select_vector) {
    rlang::expr( data %>% !!dplyr_logic ) %>%
        eval() %>%
        select( one_of(select_vector) )
}

Second, the chained mutate is actually a bit tricky. Recall that x %>% f(y) can be rewritten as f(x,y). Your double-mutate expression can therefore be re-written as mutate( mutate(expr1), expr2 ). When you feed the data to it, it becomes

mutate(data, mutate(expr1), expr2)

instead of the desired

mutate(mutate(data, expr1), expr2)

So, we need to use the pronoun . to specify where the pipe input should go to in our complex expression:

logic <- rlang::exprs(                # We can use exprs instead of list(expr())
  I(),
  mutate(New_Column1 = case_when(
    Sepal.Length > 7 ~'Big',
    Sepal.Length > 6 ~ 'Medium',
    TRUE ~ 'Small'
    )),
  {mutate(., New_Column2 = case_when(       # <--- NOTE the { and the .
    Sepal.Width > 3.5 ~'Big2',
    Sepal.Width > 3 ~ 'Medium2',
    TRUE ~ 'Small2')) %>%
    mutate(
      New_Column3 = case_when(
        Petal.Width > 2 ~'Big3',
        Petal.Width > 1 ~ 'Medium3',
        TRUE ~ 'Small3'
      ))},                                  # <--- NOTE the matching }
  filter(Sepal.Width > 3)
)

Everything works now:

res <- pmap(list(dd$data, logic, select_vec), eval_data)

## Compare to desired output
map2_lgl( res, pmap_output, identical )
#  mutate0 mutate1 mutate2 filter1
#     TRUE    TRUE    TRUE    TRUE
like image 55
Artem Sokolov Avatar answered Dec 02 '25 18:12

Artem Sokolov