Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions (RegEx) and dplyr::filter()

Tags:

regex

r

dplyr

I have a simple data frame that looks like this:

x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc") y <- c(101, 102, 113, 201, 202, 344, 407) df = data.frame(x, y)          x   y 1   aa  101 2   aa  102 3   aa  113 4   bb  201 5   cc  202 6   cc  344 7   cc  407 

I would like to use a dplyr::filter() and a RegEx to filter out all the y observations that start with the number 1

I'm imagining that the code will look something like this:

df %>%   filter(y != grep("^1"))  

But I am getting an Error in grep("^1") : argument "x" is missing, with no default

like image 732
emehex Avatar asked Mar 04 '15 16:03

emehex


People also ask

What is filter dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

What is regex filter?

The Regex Filter transform filters messages in the data stream according to a regular expression (regex) pattern, which you can define. You also define the Regex Filter to either accept or deny incoming messages based on the regular expression.

How do I filter multiple items in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.


1 Answers

You need to double check the documentations for grepl and filter.

For grep/grepl you have to also supply the vector that you want to check in (y in this case) and filter takes a logical vector (i.e. you need to use grepl). If you want to supply an index vector (from grep) you can use slice instead.

df %>% filter(!grepl("^1", y)) 

Or with an index derived from grep:

df %>% slice(grep("^1", y, invert = TRUE)) 

But you can also just use substr because you are only interested in the first character:

df %>% filter(substr(y, 1, 1) != 1) 
like image 125
talat Avatar answered Sep 21 '22 09:09

talat