Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify "does not contain" in dplyr filter

Tags:

r

filter

dplyr

I am quite new to R.

Using the table called SE_CSVLinelist_clean, I want to extract the rows where the Variable called where_case_travelled_1 DOES NOT contain the strings "Outside Canada" OR "Outside province/territory of residence but within Canada". Then create a new table called SE_CSVLinelist_filtered.

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
where_case_travelled_1 %in% -c('Outside Canada','Outside province/territory of residence but within Canada'))

The code above works when I just use "c" and not "-c".
So, how do I specify the above when I really want to exclude rows that contains that outside of the country or province?

like image 610
ayk Avatar asked Dec 23 '15 21:12

ayk


People also ask

How do you use not in filter?

Method 1: Use NOT IN Filter with One Column We are using isin() operator to get the given values in the dataframe and those values are taken from the list, so we are filtering the dataframe one column values which are present in that list.

How do I specify not in R?

The not in r is the Negation of the %in% operator. The %in% operator is used to identify if an element belongs to a vector. The ! indicates logical negation (NOT).

How do I filter something not in R?

You can use the following basic syntax in dplyr to filter for rows in a data frame that are not in a list of values: df %>% filter(! col_name %in% c('value1', 'value2', 'value3', ...))

How do I filter rows containing certain text in R?

Often you may want to filter rows in a data frame in R that contain a certain string. Fortunately this is easy to do using the filter() function from the dplyr package and the grepl() function in Base R.


4 Answers

Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement:

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
 !where_case_travelled_1 %in% 
   c('Outside Canada','Outside province/territory of residence but within Canada'))

Regarding your original approach with -c(...), - is a unary operator that "performs arithmetic on numeric or complex vectors (or objects which can be coerced to them)" (from help("-")). Since you are dealing with a character vector that cannot be coerced to numeric or complex, you cannot use -.

like image 111
fishtank Avatar answered Oct 24 '22 18:10

fishtank


Try putting the search condition in a bracket, as shown below. This returns the result of the conditional query inside the bracket. Then test its result to determine if it is negative (i.e. it does not belong to any of the options in the vector), by setting it to FALSE.

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
(where_case_travelled_1 %in% c('Outside Canada','Outside province/territory of residence but within Canada')) == FALSE)
like image 44
BWO Avatar answered Oct 24 '22 17:10

BWO


Just be careful with the previous solutions since they require to type out EXACTLY the string you are trying to detect.

Ask yourself if the word "Outside", for example, is sufficient. If so, then:

data_filtered <- data %>% 
  filter(!str_detect(where_case_travelled_1, "Outside")

A reprex version:

iris

iris %>% 
  filter(!str_detect(Species, "versicolor"))
like image 32
Austin Avatar answered Oct 24 '22 17:10

Austin


Quick fix. First define the opposite of %in%:

  '%ni%' <- Negate("%in%")

Then apply:

SE_CSVLinelist_filtered <- filter(
    SE_CSVLinelist_clean, 
    where_case_travelled_1 %ni% c('Outside Canada',
      'Outside province/territory of residence but within Canada'))
like image 36
ToWii Avatar answered Oct 24 '22 17:10

ToWii