Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unquote string as variable in pipe

Tags:

r

dplyr

I want to remove duplicate rows from a dataframe, for specific columns only. That can be obtained with distinct:

data <- tibble(a = c(1, 1, 2, 2), b = c(3, 3, 3, 4), z = c(5,4,5,5))
filtered_data <- data %>% distinct(a, b, .keep_all = T)
dim(filtered_data)  
# [1] 3 3

This is (almost) what I need. Yet, my problem is that the columnnames I need to use with distinct will change. So I have a string gen that contains the names of the columns I want to use for with the distinct function. They need to get unquoted to be usefull in the pipe. I found suggestions to use as.name() or eval(parse()). This however gives me a different result:

gen <- c("a", "b")
filtered_data <- data %>% distinct(eval(parse(text = gen)), .keep_all = T)
dim(filtered_data)  
# [1] 2 4

The eval seems to do something funny with the amount of times the data is filtered. (and, adds an extra column. I could live with that, though...) So, how to obtain a similar result, as if I had used a,b, but by using a variable instead?

additional information I actually obtain gen by reading the columnnames of a dataframe: gen <- colnames(data)[1:2]. The solution suggested by @gymbrane would be perfect, if I had a way to transform the gen to c(a, b). The whole point is to avoid hardcoding the columnames. I tried things like gen <- noquotes(gen), which does not give an error in the rm_dup_rows function suggested below, but it does give a different result, giving the same sort of repeated filtering as I started with...

fixed I think I got it working. It might be unelegant, and I'm not sure if every step is necessary for the result, but it seems to work by combining the function provided by @gymbrane below with ensym and quos in a forloop while adding to a list in GlobalEnv (edit: GlobalEnv isn't necessary):

unquote_string <- function(string) {
  out <- list()
  i <- 1
  for (s in string) {
    t <- ensym(s)
    out[i] <-dplyr::quos(!!t)
    i <- i+1
  }
return(out)
}
gen_quo <- unquote_string(gen)
filtered_data <- rm_dup_rows(data, gen_quo)
dim(filtered_data)
# [1] 3 3 
like image 917
raoul Avatar asked Oct 16 '22 15:10

raoul


1 Answers

How about creating a function and using quosures . Perhaps something like this is what you are looking for...

rm_dup_rows <- function(data, ...){
  vars = dplyr::quos(...)
  data %>% distinct(!!! vars, .keep_all = T)
}

I believe this returns what you are asking for

rm_dup_rows(data = data, a, b)

# A tibble: 3 x 3
  a     b     z
<dbl> <dbl> <dbl>
    1     3     5
    2     3     5
    2     4     5


rm_dup_rows(data, b, z)
# A tibble: 3 x 3
a     b     z
<dbl> <dbl> <dbl>
    1     3     5
    1     3     4
    2     4     5

Additional

You could modify rm_dup_rows just slightly and construct and your vector with quos. Something like this...

rm_dup_rows <- function(data, vars){
  data %>% distinct(!!! vars, .keep_all = T)
}

# quos your column name vector
gen <- quos(a,z)

rm_dup_rows(data, gen)
# A tibble: 3 x 3
  a     b     z
 <dbl> <dbl> <dbl>
   1     3     5
   1     3     4
   2     3     5
like image 79
gymbrane Avatar answered Oct 21 '22 00:10

gymbrane