Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make in R matrix of intersections and unions over categories?

Tags:

r

Let's have this data

> allt <- data.frame(day = rep(c("mon", "tue", "wed"), each =3), id = c(1:3,2:4,3:5))
> allt
  day id
1 mon  1
2 mon  2
3 mon  3
4 tue  2
5 tue  3
6 tue  4
7 wed  3
8 wed  4
9 wed  5

In the final data frame we can see that for day "mon" we have ids [1,2,3] and for "tue" we have [2,3,4]. So if we make intersection of these vectors we get [2,3] and if we make union we get [1,2,3,4] . The lengths of these vectors are 2 respectively 4 and the ratio is 0.5. That is the number I want to get.

So I am looking for a generalized way how to get this ratio over more categories for all possible combinations.

The result could be in a format something like a correlation matrix. Just to be clear I am interested in intersections and union of 2 categories so e.g I don't need a 4-way intersection (Mon,Tue,Wed,Thu) - just each 2 day intersection.

like image 298
tomas hujo Avatar asked Apr 24 '19 11:04

tomas hujo


People also ask

How do you find the intersection of two lists in R?

intersect() function in R Language is used to find the intersection of two Objects. This function takes two objects like Vectors, dataframes, etc. as arguments and results in a third object with the common data of both the objects.

How to get intersection in R?

You can use the intersect() function in base R to find the intersection of two objects. The “intersection” simply represents the elements that the two objects have in common.

What is intersect R?

The intersect() function in R determines the intersection of the subsets of a probability space. For example, intersect(A, B) returns the rows that are common to the subsets A and B .


1 Answers

Maybe something like this?

days <- levels(allt$day)

f <- function(x, y) {
  xids <- allt$id[allt$day == x]
  yids <- allt$id[allt$day == y]
  length(intersect(xids, yids)) / length(union(xids, yids))
}
f <- Vectorize(f)

outer(days, days, f)

#      [,1] [,2] [,3]
# [1,]  1.0  0.5  0.2
# [2,]  0.5  1.0  0.5
# [3,]  0.2  0.5  1.0

optionally pipe that into set_colnames(days) and set_rownames(days)

like image 148
Robin Gertenbach Avatar answered Sep 30 '22 17:09

Robin Gertenbach