Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove row if any column contains a specific string

Tags:

r

I am trying to figure out the best approach in R to remove rows that contain a specific string, in my case 'no_data'.

I have data from an outside source that imputes na's with 'no_data'

an example is this:

 time  |speed  |wheels
1:00   |30     |no_data
2:00   |no_data|18
no_data|no_data|no_data
3:00   |50     |18

I want to go through the data and remove each row containing this 'no_data' string in any column. I have had a lot of trouble figuring this out. I have tried an sapply, filter, grep and combinations of the three. I am by no means an r expert so it could just be me incorrectly using these. Any help would be appreciated.

like image 333
lentz Avatar asked Jun 14 '17 12:06

lentz


1 Answers

Two dplyr options: (using Akrun's data from this answer)

library(dplyr)

## using the newer across()

df1 %>% filter(across(everything(), ~ !grepl("no_data", .)))
#>   time speed wheels
#> 1 3:00    50     18

## with the superseded filter_all

df1 %>% filter_all(all_vars(!grepl("no_data", .)))
#>   time speed wheels
#> 1 3:00    50     18

Caveat:
This only works if you want to remove all rows with that string. If you want to get all rows with this string, all_vars(grepl('no_data',.) (without !) would not be sufficient: This would only get the rows where all columns contain the string. In this case, use filter_all(any_vars()) instead.

like image 160
tjebo Avatar answered Sep 20 '22 12:09

tjebo