Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial intersection of elements across vectors in two lists

I have a list like this:

mylist <- list(PP = c("PP 1", "OMITTED"),
           IN01 = c("DID NOT PARTICIPATE", "PARTICIPATED", "OMITTED"),                     
           RD1 = c("YES", "NO", "NOT REACHED", "INVALID", "OMITTED"),
           RD2 = c("YES", "NO", "NOT REACHED", "NOT AN OPTION", "OMITTED"),
           LOS = c("LESS THAN 3", "3 TO 100", "100 TO 500", "MORE THAN 500", "LOGICALLY NOT APPLICABLE", "OMITTED"),
           COM = c("BAN", "SBAN", "RAL"), 
           VR1 = c("WITHIN 30", "WITHIN 200", "NOT AVAILABLE", "OMITTED"),                         
           INF = c("A LOT", "SOME", "LITTLE OR NO", "NOT APPLICABLE", "OMITTED"),               
           IST = c("FULL-TIME", "PART-TIME", "FULL STAFFED", "NOT STAFFED", "LOGICALLY NOT APPLICABLE", "OMITTED"),
           CMP = c("ALL", "MOST", "SOME", "NONE", "LOGICALLY NOT APPLICABLE", "OMITTED"))

I have another list like this:

matchlist <- list("INVALID", c("INVALID", "OMITTED OR INVALID"),
c("INVALID", "OMITTED"), "OMITTED", c("NOT REACHED", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED OR INVALID"),
c("Not applicable", "Not stated"), c("Not reached", "Not administered/missing by design", "Presented but not answered/invalid"),
c("Not administered/missing by design", "Presented but not answered/invalid"),
"OMITTED OR INVALID",
c("LOGICALLY NOT APPLICABLE", "OMITTED OR INVALID"),
c("NOT REACHED", "OMITTED"),
c("NOT APPLICABLE", "OMITTED"), 
c("LOGICALLY NOT APPLICABLE", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "NOT REACHED", "OMITTED"),
"NOT EXCLUDED", c("Default", "Not applicable", "Not stated"), c("Valid Skip", "Not Reached", "Not Applicable", "Invalid", "No Response"),
c("Not administered", "Omitted"),
c("NOT REACHED", "INVALID RESPONSE", "OMITTED"),
c("INVALID RESPONSE", "OMITTED"))

As you can see, some of the vectors in matchlist partially match vectors in mylist. In some cases the vectors in matchlist have exact match with part of vectors in mylist. For example, the last values of RD1 in mylist match the vector in the fifth component of matchlist, but RD2 does not match it, although common values are present. The values in RD2 in mylist ("NOT REACHED", "NOT AN OPTION", "OMITTED") together and in this order do not have a match in any of the vectors in matchlist. It is the same for the values of COM in mylist.

What I am trying to achieve is to compare the elements in each vector in mylist against each vector in matchlist, extract the values that are common and match the values in matchlist in the same order, and store them in another list. The desired result shall look like this:

$PP
[1] "OMITTED"

$IN01
[1] "OMITTED"

$RD1
[1] "NOT REACHED" "INVALID" "OMITTED"

$RD2
character(0)

$LOS
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"

$COM
character(0)

$VR1
[1] "OMITTED"

$INF
[1] "NOT APPLICABLE" "OMITTED"

$IST
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"

$CMP
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"

What I tried so far:

Using intersect

lapply(mylist, function(i) {
  intersect(i, lapply(matchlist, function(i) {i}))
})

It returns only the last value in each vector of matchlist ("OMITTED").

Using match through %in%:

lapply(mylist, function(i) {
  i[which(i %in% matchlist)]
})

Returns the desired result only for RD1 ("INVALID", "OMITTED"), for the rest it returns just the last value ("OMITTED"), except for COM which is correct.

Using mapply and intersect:

mapply(intersect, mylist, matchlist)

Returns a long list with mixture of pretty much everything, including combinations that should not be there, plus a warning for the unequal lengths.

Can someone help, please?

like image 825
panman Avatar asked Dec 23 '22 09:12

panman


1 Answers

Here is a simple solution using unlist with matchlist:

lapply(mylist, function(x) x[x %in% unlist(matchlist)])

Output (new list):

$PP
[1] "OMITTED"

$IN01
[1] "OMITTED"

$RD1
[1] "NOT REACHED" "INVALID"     "OMITTED"    

$LOS
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"                 

$COM
character(0)

$VR1
[1] "OMITTED"

$INF
[1] "NOT APPLICABLE" "OMITTED"       

$IST
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"                 

$CMP
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"                 
like image 76
Carles Mitjans Avatar answered Dec 28 '22 06:12

Carles Mitjans