Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter data frame columns based on list values

I have a data frame as such:

df <- data.frame(var1 = c(1,1,3,4,5,6,7,8,9),
       var2 = c(11,11,33,44,55,66,77,88,99),
       var3 = c(111,111,333,444,555,666,777,888,999),
       var4 = c(1111,1111,3333,4444,5555,6666,7777,8888,9999))
> df
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111
3    3   33  333 3333
4    4   44  444 4444
5    5   55  555 5555
6    6   66  666 6666
7    7   77  777 7777
8    8   88  888 8888
9    9   99  999 9999

I would like to filter for specific rows based on multiple column values stored in a list.

For example:

my_list <- list(var1 = 1,
     var2 = 11,
     var3 = 111)
filtered_df <- df %>% filter(var1 == my_list$var1[[1]],
              var2 == my_list$var2[[1]],
              var3 == my_list$var3[[1]])
> filtered_df
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111

Instead of including every variable within filter, can I select the elements within my list and filter the data frame using the names of my list?

like image 398
Ali Avatar asked Mar 20 '20 10:03

Ali


People also ask

How do you filter a DataFrame based on a list?

Use pandas. DataFrame. isin() to filter a DataFrame using a list.

How do I filter a DataFrame based on column values in R?

Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained.


2 Answers

What you suggest is very similar to the natural way of things in data.table:

library(data.table)
setDT(df)
df[my_list, on = .(var1, var2, var3)]

   var1 var2 var3 var4
1:    1   11  111 1111
2:    1   11  111 1111

If you specify your keys first, the filtering is more concise:

setkey(df, var1, var2, var3)
df[my_list]

A base R alternative:

df[rowSums(df[1:3] == my_list) == 3L, ]
like image 146
sindri_baldur Avatar answered Oct 13 '22 17:10

sindri_baldur


A base R solution is this:

First you paste the values in your list together collapsing them by the alternation marker |:

my_list_1 <- paste0(unlist(my_list), collapse = "|")
my_list_1
[1] "1|11|111"

Then, using this alternation string my_list_1 as well as pasting the rows of your dataframe together using apply, you subset your dataframe on those rows that match my_list_1:

df[which(grepl(my_list_1, apply(df, 1, paste0, collapse = " "))),]
  var1 var2 var3 var4
1    1   11  111 1111
2    1   11  111 1111
like image 30
Chris Ruehlemann Avatar answered Oct 13 '22 17:10

Chris Ruehlemann