I have a data frame as such:
df <- data.frame(var1 = c(1,1,3,4,5,6,7,8,9),
       var2 = c(11,11,33,44,55,66,77,88,99),
       var3 = c(111,111,333,444,555,666,777,888,999),
       var4 = c(1111,1111,3333,4444,5555,6666,7777,8888,9999))
> df
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111
3    3   33  333 3333
4    4   44  444 4444
5    5   55  555 5555
6    6   66  666 6666
7    7   77  777 7777
8    8   88  888 8888
9    9   99  999 9999
I would like to filter for specific rows based on multiple column values stored in a list.
For example:
my_list <- list(var1 = 1,
     var2 = 11,
     var3 = 111)
filtered_df <- df %>% filter(var1 == my_list$var1[[1]],
              var2 == my_list$var2[[1]],
              var3 == my_list$var3[[1]])
> filtered_df
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111
Instead of including every variable within filter, can I select the elements within my list and filter the data frame using the names of my list?
Use pandas. DataFrame. isin() to filter a DataFrame using a list.
Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained.
What you suggest is very similar to the natural way of things in data.table:
library(data.table)
setDT(df)
df[my_list, on = .(var1, var2, var3)]
   var1 var2 var3 var4
1:    1   11  111 1111
2:    1   11  111 1111
If you specify your keys first, the filtering is more concise:
setkey(df, var1, var2, var3)
df[my_list]
A base R alternative:
df[rowSums(df[1:3] == my_list) == 3L, ]
                        A base R solution is this:
First you paste the values in your list together collapsing them by the alternation marker |:
my_list_1 <- paste0(unlist(my_list), collapse = "|")
my_list_1
[1] "1|11|111"
Then, using this alternation string my_list_1 as well as pasting the rows of your dataframe together using apply, you subset your dataframe on those rows that match my_list_1:
df[which(grepl(my_list_1, apply(df, 1, paste0, collapse = " "))),]
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With