Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr filter on a vector rather than a dataframe in R

Tags:

r

dplyr

This seems like a simple question, but I have not come across a clean solution for it yet. I have a vector in R and I want to remove certain elements from the vector, however I want to avoid the vector[vector != "thiselement"] notation for a variety of reasons. In particular, here is what I am trying to do:

# this doesnt work
all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska")

# this doesnt work either
all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska")

# this does work but i want to avoid this approach to filtering
all_states = gsub(" ", "-", tolower(state.name))
all_states = all_states[all_states != "alaska"]

can this be done in a simple manner? Thanks in advance for the help!

EDIT - the reason I'm struggling with this is because I'm only finding things online regarding filtering based on a column of a dataframe, for example:

my_df %>% filter(col != "alaska")

however I'm working with a vector not a dataframe here

like image 240
Canovice Avatar asked May 24 '17 21:05

Canovice


People also ask

What does dplyr filter do?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

What is the difference between subset and filter in R?

subset has a select argument. subset recycles its condition argument. filter supports conditions as separate arguments. filter preserves the class of the column.

What does filter () mean in R?

Overview. The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.


1 Answers

Sorry for posting on a 5-month-old question to archive a simpler solution.

Package dplyr can filter character vectors in following ways:

> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"

The first approach allows you to filter with regular expression, and the second approach uses fewer words. It works because package dplyr imports package magrittr albeit masks its functions like extract, but not the placeholder ..

Details of placeholder . can be found on within help of forward-pipe operator %>%, and this placeholder has mainly three usage:

  • Using the dot for secondary purposes
  • Using lambda expressions with %>%
  • Using the dot-place holder as lhs

Here we are taking advantage of its 3rd usage.

like image 191
Quar Avatar answered Sep 21 '22 05:09

Quar