I have a data frame which shows membership in three color classes. Numbers refer to unique IDs. One ID may be a part of one group or multiple groups.
dat <- data.frame(BLUE = c(1, 2, 3, 4, 6, NA),
RED = c(2, 3, 6, 7, 9, 13),
GREEN = c(4, 6, 8, 9, 10, 11))
or for visual reference:
BLUE RED GREEN
1 2 4
2 3 6
3 6 8
4 7 9
6 9 10
NA 13 11
I need to identify and tally individual and cross group membership (i.e. how many IDs were only in red, how many were in both red and blue, etc.) My desired output is below. Please note that the IDs column is simply for reference, that column would not be in the expected output.
COLOR TOTAL IDs (reference only, not needed in final output)
RED 2 (7, 13)
BLUE 1 (1)
GREEN 3 (8, 10, 11)
RED, BLUE 3 (2, 3, 6)
RED, GREEN 2 (6, 9)
BLUE, GREEN 2 (4, 6)
RED, BLUE, GREEN 1 (6)
Does anyone know an efficient way to do this in R? Thanks!
You can use the venn
library (especially suited for situations when you do not have NAs in your data):
venn_table <- venn(as.list(dat))
BLUE RED GREEN counts
0 0 0 0
GREEN 0 0 1 3
RED 0 1 0 2
RED:GREEN 0 1 1 1
BLUE 1 0 0 2
BLUE:GREEN 1 0 1 1
BLUE:RED 1 1 0 2
BLUE:RED:GREEN 1 1 1 1
And:
attr(venn_table, "intersections")
$GREEN
[1] 8 10 11
$RED
[1] 7 13
$`RED:GREEN`
[1] 9
$BLUE
[1] 1 NA
$`BLUE:GREEN`
[1] 4
$`BLUE:RED`
[1] 2 3
$`BLUE:RED:GREEN`
[1] 6
To include also the IDs:
data.frame(venn_table[2:nrow(venn_table), ],
ID = do.call("rbind", lapply(attr(venn_table, "intersections"), paste0, collapse = ",")))
BLUE RED GREEN counts ID
GREEN 0 0 1 3 8,10,11
RED 0 1 0 2 7,13
RED:GREEN 0 1 1 1 9
BLUE 1 0 0 2 1,NA
BLUE:GREEN 1 0 1 1 4
BLUE:RED 1 1 0 2 2,3
BLUE:RED:GREEN 1 1 1 1 6
One way to deal with the the NAs:
venn_table2 <- data.frame(venn_table[2:nrow(venn_table), length(venn_table), drop = FALSE],
ID = do.call("rbind", lapply(attr(venn_table, "intersections"), paste0, collapse = ",")))
counts <- venn_table2[1] - with(venn_table2, lengths(regmatches(ID, gregexpr("NA", ID))))
counts
GREEN 3
RED 2
RED:GREEN 1
BLUE 1
BLUE:GREEN 1
BLUE:RED 2
BLUE:RED:GREEN 1
And a more elegant way to deal with the NAs could be (based on a comment from @M--):
print(venn(Map(function(x) x[!is.na(x)], as.list(dat))))
BLUE RED GREEN counts
0 0 0 0
GREEN 0 0 1 3
RED 0 1 0 2
RED:GREEN 0 1 1 1
BLUE 1 0 0 1
BLUE:GREEN 1 0 1 1
BLUE:RED 1 1 0 2
BLUE:RED:GREEN 1 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With