I am trying to delete specific rows in my dataset based on values in multiple columns. A row should be deleted only when a condition in all 3 columns is met.
This is my code:
test_dff %>%
filter(contbr_nm != c('GAITHER, BARBARA', 'PANIC, RADIVOJE', 'KHAN, RAMYA') &
contbr_city != c('APO AE', 'PORSGRUNN', 'NEW YORK') &
contbr_zip != c('9309', '3924', '2586'))
This code should remove 12 rows in my table. Instead it removes a vast majority of them. I am suspecting, that it removes all the possible rows, whenever one of the conditions is met.
Is there a better solution, or do I have to use the approach, described here?
Do I need to specify each combination separately? Like so? This approach also deletes far too many rows, so it is also wrong.
test_dff %>%
filter((contbr_nm != 'GAITHER, BARBARA' & contbr_city != 'APO AE' & contbr_zip != '9309') &
(contbr_nm != 'PANIC, RADIVOJE' & contbr_city != 'PORSGRUNN' & contbr_zip != '3924') &
(contbr_nm != 'KHAN, RAMYA' & contbr_city != 'NEW YORK' & contbr_zip != '2586') )
If I focus on deleting rows only based on one variable, this piece of code works:
test_dff %>%
filter(contbr_zip != c('9309')) %>%
filter(contbr_zip != c('3924')) %>%
filter(contbr_zip != c('2586'))
Why does such an approach not work?
test_dff %>%
filter(contbr_zip != c('9309','3924','2586'))
Thanks a lot for your help.
To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
Here is a join-based approach - all items must be exact matches.
main <- read.csv(text = "
id,name,city,zip
1,mary,new york,10017
2,jonah,new york,10016
3,tamil,manhattan,10019
4,vijay,harlem,10028
")
excludes <- read.csv(text = "
name,city,zip
jonah,new york,10016
vijay,harlem,10028
")
library(dplyr)
anti_join(main, excludes)
# id name city zip
# 1 3 tamil manhattan 10019
# 2 1 mary new york 10017
Here's an approach that creates a new variable by concatenating the values in the multiple columns you want to reference with your filter:
set.seed(15)
dfTest <- data.frame(matrix(round(rnorm(20),3), nrow=10))
dfTest$tempcol <- paste(dfTest$X1,dfTest$X2)
head(dfTest)
X1 X2 tempcol
1 0.259 0.855 0.259 0.855
2 1.831 -0.365 1.831 -0.365
3 -0.340 0.166 -0.34 0.166
4 0.897 -1.243 0.897 -1.243
5 0.488 1.459 0.488 1.459
6 -1.255 -0.004 -1.255 -0.004
#Now remove the values by filtering on tempcol
dfTest %>%
filter(tempcol != '0.259 0.855') %>%
select(1:2) #omit tempcol in output
X1 X2
1 1.831 -0.365
2 -0.340 0.166
3 0.897 -1.243
4 0.488 1.459
5 -1.255 -0.004
6 0.023 -0.021
7 1.091 0.032
8 -0.132 -1.167
9 -1.075 -0.520
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With