How to make in R matrix of intersections and unions over categories?

Tags:

r

Let's have this data

> allt <- data.frame(day = rep(c("mon", "tue", "wed"), each =3), id = c(1:3,2:4,3:5))
> allt
  day id
1 mon  1
2 mon  2
3 mon  3
4 tue  2
5 tue  3
6 tue  4
7 wed  3
8 wed  4
9 wed  5

In the final data frame we can see that for day "mon" we have ids [1,2,3] and for "tue" we have [2,3,4]. So if we make intersection of these vectors we get [2,3] and if we make union we get [1,2,3,4] . The lengths of these vectors are 2 respectively 4 and the ratio is 0.5. That is the number I want to get.

So I am looking for a generalized way how to get this ratio over more categories for all possible combinations.

The result could be in a format something like a correlation matrix. Just to be clear I am interested in intersections and union of 2 categories so e.g I don't need a 4-way intersection (Mon,Tue,Wed,Thu) - just each 2 day intersection.

298

asked Apr 24 '19 11:04

tomas hujo

1 Answers

Maybe something like this?

days <- levels(allt$day)

f <- function(x, y) {
  xids <- allt$id[allt$day == x]
  yids <- allt$id[allt$day == y]
  length(intersect(xids, yids)) / length(union(xids, yids))
}
f <- Vectorize(f)

outer(days, days, f)

#      [,1] [,2] [,3]
# [1,]  1.0  0.5  0.2
# [2,]  0.5  1.0  0.5
# [3,]  0.2  0.5  1.0

optionally pipe that into set_colnames(days) and set_rownames(days)

148

answered Sep 30 '22 17:09

Robin Gertenbach

Related questions
                            
                                r gis: identify inner borders between polygons with sf
                            
                                Can R read html-encoded emoji characters?
                            
                                How to conditionally replace values in r data frame using if/then statement
                            
                                Converting a number into time (0,5 of an hour = 00:30:00)
                            
                                R bookdown - custom title page
                            
                                Add titles to ggplots created with map()
                            
                                Set transparency/saturation of palette in ggplot
                            
                                Creating a named vector using dplyr
                            
                                Size legend of sf object won't show correct symbols
                            
                                Stacked barplot with colour gradients for each bar
                            
                                Error in osmar::get_osm() downloading OSM data fails: SYSTEM or PUBLIC, the URI is missing
                            
                                Singularity in backsolve at level 0, block 1 in LME model
                            
                                RDS file size difference between ggplot2 objects created inside vs. outside function
                            
                                Split and re-concatenate a string
                            
                                Retrieve Census tract from Coordinates [closed]
                            
                                dplyr lag with n from column values
                            
                                Center leaflet in a rmarkdown document
                            
                                Fixing the order of a Sankey flow graph in R / networkD3 package
                            
                                How to convert the result of xtabs() into dataframe in R? [duplicate]
                            
                                name character vectors with same name of list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With