I have a list like this:
mylist <- list(PP = c("PP 1", "OMITTED"),
IN01 = c("DID NOT PARTICIPATE", "PARTICIPATED", "OMITTED"),
RD1 = c("YES", "NO", "NOT REACHED", "INVALID", "OMITTED"),
RD2 = c("YES", "NO", "NOT REACHED", "NOT AN OPTION", "OMITTED"),
LOS = c("LESS THAN 3", "3 TO 100", "100 TO 500", "MORE THAN 500", "LOGICALLY NOT APPLICABLE", "OMITTED"),
COM = c("BAN", "SBAN", "RAL"),
VR1 = c("WITHIN 30", "WITHIN 200", "NOT AVAILABLE", "OMITTED"),
INF = c("A LOT", "SOME", "LITTLE OR NO", "NOT APPLICABLE", "OMITTED"),
IST = c("FULL-TIME", "PART-TIME", "FULL STAFFED", "NOT STAFFED", "LOGICALLY NOT APPLICABLE", "OMITTED"),
CMP = c("ALL", "MOST", "SOME", "NONE", "LOGICALLY NOT APPLICABLE", "OMITTED"))
I have another list like this:
matchlist <- list("INVALID", c("INVALID", "OMITTED OR INVALID"),
c("INVALID", "OMITTED"), "OMITTED", c("NOT REACHED", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "INVALID", "OMITTED OR INVALID"),
c("Not applicable", "Not stated"), c("Not reached", "Not administered/missing by design", "Presented but not answered/invalid"),
c("Not administered/missing by design", "Presented but not answered/invalid"),
"OMITTED OR INVALID",
c("LOGICALLY NOT APPLICABLE", "OMITTED OR INVALID"),
c("NOT REACHED", "OMITTED"),
c("NOT APPLICABLE", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "OMITTED"),
c("LOGICALLY NOT APPLICABLE", "NOT REACHED", "OMITTED"),
"NOT EXCLUDED", c("Default", "Not applicable", "Not stated"), c("Valid Skip", "Not Reached", "Not Applicable", "Invalid", "No Response"),
c("Not administered", "Omitted"),
c("NOT REACHED", "INVALID RESPONSE", "OMITTED"),
c("INVALID RESPONSE", "OMITTED"))
As you can see, some of the vectors in matchlist
partially match vectors in mylist
. In some cases the vectors in matchlist
have exact match with part of vectors in mylist
. For example, the last values of RD1
in mylist
match the vector in the fifth component of matchlist
, but RD2
does not match it, although common values are present. The values in RD2
in mylist
("NOT REACHED", "NOT AN OPTION", "OMITTED") together and in this order do not have a match in any of the vectors in matchlist
. It is the same for the values of COM
in mylist
.
What I am trying to achieve is to compare the elements in each vector in mylist
against each vector in matchlist
, extract the values that are common and match the values in matchlist
in the same order, and store them in another list. The desired result shall look like this:
$PP
[1] "OMITTED"
$IN01
[1] "OMITTED"
$RD1
[1] "NOT REACHED" "INVALID" "OMITTED"
$RD2
character(0)
$LOS
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$COM
character(0)
$VR1
[1] "OMITTED"
$INF
[1] "NOT APPLICABLE" "OMITTED"
$IST
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$CMP
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
What I tried so far:
Using intersect
lapply(mylist, function(i) {
intersect(i, lapply(matchlist, function(i) {i}))
})
It returns only the last value in each vector of matchlist
("OMITTED").
Using match
through %in%
:
lapply(mylist, function(i) {
i[which(i %in% matchlist)]
})
Returns the desired result only for RD1
("INVALID", "OMITTED"), for the rest it returns just the last value ("OMITTED"), except for COM
which is correct.
Using mapply
and intersect
:
mapply(intersect, mylist, matchlist)
Returns a long list with mixture of pretty much everything, including combinations that should not be there, plus a warning for the unequal lengths.
Can someone help, please?
Here is a simple solution using unlist
with matchlist
:
lapply(mylist, function(x) x[x %in% unlist(matchlist)])
Output (new list):
$PP
[1] "OMITTED"
$IN01
[1] "OMITTED"
$RD1
[1] "NOT REACHED" "INVALID" "OMITTED"
$LOS
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$COM
character(0)
$VR1
[1] "OMITTED"
$INF
[1] "NOT APPLICABLE" "OMITTED"
$IST
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
$CMP
[1] "LOGICALLY NOT APPLICABLE" "OMITTED"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With