Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr Filter Multiple numbers over multiple columns simultaneously

Tags:

r

dplyr

I have a data set like this:

PatientNum<- c(1, 2, 3, 4, 5)
Age<- c(10, 21, 51, 42, 35) 
SurgicalProcedureNumber <- c(21356, 21424, 221356, 12345, 54321) 
OtherSurgicalProcedureNumber   <- c(54321, 21356, 12345, 12345, 21424) 
BloodLoss<-c(5,4,5,10,5)
YetAnotherSurgicalProcedureNumber<-c(11111,22222,33333,21356,555555)

dataset <- data.frame(PatientNum, Age, SurgicalProcedureNumber, OtherSurgicalProcedureNumber, BloodLoss,YetAnotherSurgicalProcedureNumber)  

I want to know when the patients underwent certain surgeries:

NumbersIcareAbout<- c(21356,21424)

I know to find them in one column I could do:

dataset%>%filter(SurgicalProcedureNumber%in%NumbersIcareAbout)

And that would return patient 1 and 2 because it has those codes... but I want it to return the row anytime those numbers shows up in SurgicalProcedureNumber,OtherSurgicalProcedureNumber, or YetAnotherSurgicalProcedureNumber

No idea how and couldn't find it on here.

like image 919
Joe Crozier Avatar asked Jan 03 '23 13:01

Joe Crozier


2 Answers

In dplyr version 1.0.4, we can use filter with if_any

library(dplyr)
dataset %>%
      filter(if_any(ends_with("ProcedureNumber"), ~ . %in% NumbersIcareAbout))
like image 71
akrun Avatar answered Jan 13 '23 13:01

akrun


You can use the OR operator | like this:

dataset %>%
  filter(SurgicalProcedureNumber %in% NumbersIcareAbout | 
           OtherSurgicalProcedureNumber %in% NumbersIcareAbout | 
           YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout)

#   PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1          1  10                   21356                        54321         5                             11111
# 2          2  21                   21424                        21356         4                             22222
# 3          4  42                   12345                        12345        10                             21356
# 4          5  35                   54321                        21424         5                            555555

I'm sure there's a more elegant way, but this filters to those with matches in 2 or more columns (I think):

dataset %>%
  filter((SurgicalProcedureNumber %in% NumbersIcareAbout + 
          OtherSurgicalProcedureNumber %in% NumbersIcareAbout + 
          YetAnotherSurgicalProcedureNumber %in% NumbersIcareAbout) >= 2)

#   PatientNum Age SurgicalProcedureNumber OtherSurgicalProcedureNumber BloodLoss YetAnotherSurgicalProcedureNumber
# 1          2  21                   21424                        21356         4                             22222

It works by summing together the logical vectors that result from your comparisons using %in%. It relies on the fact that TRUE + TRUE gives 2. So, if a row has a value of 2 or greater then it has multiple matches.

like image 21
Lyngbakr Avatar answered Jan 13 '23 13:01

Lyngbakr