Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R: subset or dplyr::filter with variable from vector

Tags:

r

dplyr

subset

df <- 
  data.frame(a=LETTERS[1:4],
             b=rnorm(4)
             )

vals <- c("B","D")

I can filter/subset df with values in val with:

dplyr::filter(df, a %in% vals)
subset(df, a %in% vals)

Both gives:

  a         b
2 B 0.4481627
4 D 0.2916513

What if I have a variable name in a vector, e.g.:

> names(df)[1]
[1] "a"

Then it doesnt work - I guess because its quoted

dplyr::filter(df, names(df)[1] %in% vals)
[1] a b
<0 rows> (or 0-length row.names)

How do you do this ?

UPDATE ( what if its dplyr::tbl_df(df) )

Answers below work fine for data.frames, but not for dplyr::tbl_df wrapped data:

df<-dplyr::tbl_df(df)
dplyr::filter(df, df[,names(df)[1]] %in% vals)

Does not work (I thought tbl_df was a simple wrap on top of df ? )

This does work again:

dplyr::filter(df, as.data.frame(df)[,names(df)[1]] %in% vals)

FINAL UPDATE: It works with tbl_df() using lazyeval::interp

See AndreyAkinshin's solution below.

like image 939
user3375672 Avatar asked Jul 11 '15 15:07

user3375672


People also ask

How do you subset elements in a vector in R?

The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.

What is the difference between subset and filter in R?

subset has a select argument. subset recycles its condition argument. filter supports conditions as separate arguments. filter preserves the class of the column.

What does filter () mean in R?

Overview. The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.


1 Answers

You can use df[,"a"] or df[,1]:

df <- data.frame(a = LETTERS[1:4], b = rnorm(4))
vals <- c("B","D")

dplyr::filter(df, df[,1] %in% vals)
#  a         b
# 2 B 0.4481627
# 4 D 0.2916513

subset(df, df[,1] %in% vals)
#  a         b
# 2 B 0.4481627
# 4 D 0.2916513

dplyr::filter(df, df[,"a"] %in% vals)
#  a         b
# 2 B 0.4481627
# 4 D 0.2916513

subset(df, df[,"a"] %in% vals)
#  a         b
# 2 B 0.4481627
# 4 D 0.2916513

Working with dplyr::tbl_df(df)

Some magic with lazyeval::interp helps us!

df <- dplyr::tbl_df(df)
expr <- lazyeval::interp(quote(x %in% y), x = as.name(names(df)[1]), y = vals)

df %>% filter_(expr)
# Source: local data frame [2 x 2]
#
#   a        b
# 1 B 0.4481627
# 2 D 0.2916513
like image 135
AndreyAkinshin Avatar answered Oct 20 '22 00:10

AndreyAkinshin