Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert list of individuals to occurence of pairs in R

Tags:

dataframe

r

I need specific format of data.frame for social structure analysis. How to convert data.frame containing list of individuals occuring together on multiple events:

my.df <- data.frame(individual = c("A","B","C","B","C","D"),
                    time = rep(c("event_01","event_02"), each = 3))

  individual     time
1          A event_01
2          B event_01
3          C event_01
4          B event_02
5          C event_02
6          D event_02

into a data.frame containing occurence for each pairs (including [A,A]; [B,B] etc. pairs:

ind_1    ind_2   times
  A        A       0
  A        B       1
  A        C       1
  A        D       0
  B        A       1
  B        B       0
  B        C       2
  B        D       1
  C        A       1
  C        B       2
  C        C       0
  C        D       1
  D        A       0
  D        B       1
  D        C       1
  D        D       0
like image 978
Ladislav Naďo Avatar asked Dec 11 '22 23:12

Ladislav Naďo


2 Answers

In base R, you could do the following:

data.frame(as.table(`diag<-`(tcrossprod(table(my.df)), 0)))
#    individual individual.1 Freq
# 1           A            A    0
# 2           B            A    1
# 3           C            A    1
# 4           D            A    0
# 5           A            B    1
# 6           B            B    0
# 7           C            B    2
# 8           D            B    1
# 9           A            C    1
# 10          B            C    2
# 11          C            C    0
# 12          D            C    1
# 13          A            D    0
# 14          B            D    1
# 15          C            D    1
# 16          D            D    0

tcrossprod gives you the following:

> tcrossprod(table(my.df))
          individual
individual A B C D
         A 1 1 1 0
         B 1 2 2 1
         C 1 2 2 1
         D 0 1 1 1

That's essentially all the information you are looking for, but you want it in a slightly different form, without the diagonal values.

We can set the diagonals to zero with:

`diag<-`(theOutputFromAbove, 0)

Then, to get the long form, trick R into thinking that the resulting matrix is a table by using as.table, and make use of the data.frame method for tables.

like image 148
A5C1D2H2I1M1N2O1R2T1 Avatar answered Feb 14 '23 16:02

A5C1D2H2I1M1N2O1R2T1


You can do:

create the first 2 variables of the new data.frame:

df2 <- expand.grid(ind_2=levels(my.df$individual), ind_1=levels(my.df$individual))[, 2:1]

Put the value to 0 for the pairs of same individuals:

df2$times[df2[, 1]==df2[, 2]] <- 0

See the other unique combinations:

comb_diff <- combn(levels(my.df$individual), 2)

compute the times each unique combination is found together:

times_uni <- apply(comb_diff, 2, function(inds){
                                     sum(table(my.df$time[my.df$individual %in% inds])==2)
                                 })

Finally, fill the new data.frame:

df2$times[match(c(paste0(comb_diff[1,], comb_diff[2,]), paste0(comb_diff[2, ], comb_diff[1, ])), paste0(df2[, 1],df2[, 2]))] <- rep(times_uni, 2)

df2
#   ind_1 ind_2 times
#1      A     A     0
#2      A     B     1
#3      A     C     1
#4      A     D     0
#5      B     A     1
#6      B     B     0
#7      B     C     2
#8      B     D     1
#9      C     A     1
#10     C     B     2
#11     C     C     0
#12     C     D     1
#13     D     A     0
#14     D     B     1
#15     D     C     1
#16     D     D     0
like image 42
Cath Avatar answered Feb 14 '23 16:02

Cath