Let's have this data
> allt <- data.frame(day = rep(c("mon", "tue", "wed"), each =3), id = c(1:3,2:4,3:5))
> allt
day id
1 mon 1
2 mon 2
3 mon 3
4 tue 2
5 tue 3
6 tue 4
7 wed 3
8 wed 4
9 wed 5
In the final data frame we can see that for day "mon" we have ids [1,2,3] and for "tue" we have [2,3,4]. So if we make intersection of these vectors we get [2,3] and if we make union we get [1,2,3,4] . The lengths of these vectors are 2 respectively 4 and the ratio is 0.5. That is the number I want to get.
So I am looking for a generalized way how to get this ratio over more categories for all possible combinations.
The result could be in a format something like a correlation matrix. Just to be clear I am interested in intersections and union of 2 categories so e.g I don't need a 4-way intersection (Mon,Tue,Wed,Thu) - just each 2 day intersection.
intersect() function in R Language is used to find the intersection of two Objects. This function takes two objects like Vectors, dataframes, etc. as arguments and results in a third object with the common data of both the objects.
You can use the intersect() function in base R to find the intersection of two objects. The “intersection” simply represents the elements that the two objects have in common.
The intersect() function in R determines the intersection of the subsets of a probability space. For example, intersect(A, B) returns the rows that are common to the subsets A and B .
Maybe something like this?
days <- levels(allt$day)
f <- function(x, y) {
xids <- allt$id[allt$day == x]
yids <- allt$id[allt$day == y]
length(intersect(xids, yids)) / length(union(xids, yids))
}
f <- Vectorize(f)
outer(days, days, f)
# [,1] [,2] [,3]
# [1,] 1.0 0.5 0.2
# [2,] 0.5 1.0 0.5
# [3,] 0.2 0.5 1.0
optionally pipe that into set_colnames(days)
and set_rownames(days)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With