I am trying to figure out the best approach in R to remove rows that contain a specific string, in my case 'no_data'.
I have data from an outside source that imputes na's with 'no_data'
an example is this:
time |speed |wheels
1:00 |30 |no_data
2:00 |no_data|18
no_data|no_data|no_data
3:00 |50 |18
I want to go through the data and remove each row containing this 'no_data' string in any column. I have had a lot of trouble figuring this out. I have tried an sapply, filter, grep and combinations of the three. I am by no means an r expert so it could just be me incorrectly using these. Any help would be appreciated.
Two dplyr
options: (using Akrun's data from this answer)
library(dplyr)
## using the newer across()
df1 %>% filter(across(everything(), ~ !grepl("no_data", .)))
#> time speed wheels
#> 1 3:00 50 18
## with the superseded filter_all
df1 %>% filter_all(all_vars(!grepl("no_data", .)))
#> time speed wheels
#> 1 3:00 50 18
Caveat:
This only works if you want to remove all rows with that string. If you want to get all rows with this string, all_vars(grepl('no_data',.)
(without !
) would not be sufficient: This would only get the rows where all columns contain the string.
In this case, use filter_all(any_vars())
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With