Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restructuring team- to individual-level data in R (while retaining team-level information)

Tags:

r

My current data looks like this:

Person  Team
  10    100
  11    100
  12    100
  10    200
  11    200
  14    200
  15    200

I want to infer who knew one another, based on what teams they were on together. I also want a count of how many times a dyad was on a team together, and I want to keep track of the team identification codes that link each pair of people. In other words, I want to create a data set that looks like this:

Person1 Person2 Count   Team1   Team2   Team3
   10      11     2      100     200     NA
   10      12     1      100     NA      NA
   11      12     1      100     NA      NA
   10      14     1      200     NA      NA
   10      15     1      200     NA      NA
   11      14     1      200     NA      NA
   11      15     1      200     NA      NA

The resulting data set captures the relationships that can be inferred based on the teams that were outlined in the original data set. The "Count" variable reflects the number of instances that a pair of people was on a team together. The "Team1", "Team2", and "Team3" variables list the team ID(s) that link each pair of people to one another. It doesn't make a difference which person/team ID is listed first versus second. Teams range in size from 2 members to 8 members.

like image 739
waxattax Avatar asked Jan 06 '15 23:01

waxattax


2 Answers

Here's a "data.table" solution that seems to get to where you want to get (albeit with quite a mouthful of code):

library(data.table)
dcast.data.table(
  dcast.data.table(
    as.data.table(d)[, combn(Person, 2), by = Team][
      , ind := paste0("Person", c(1, 2))][
        , time := sequence(.N), by = list(Team, ind)], 
    time + Team ~ ind, value.var = "V1")[
      , c("count", "time") := list(.N, sequence(.N)), by = list(Person1, Person2)],
  Person1 + Person2 + count ~ time, value.var = "Team")
#    Person1 Person2 count   1   2
# 1:      10      11     2 100 200
# 2:      10      12     1 100  NA
# 3:      10      14     1 200  NA
# 4:      10      15     1 200  NA
# 5:      11      12     1 100  NA
# 6:      11      14     1 200  NA
# 7:      11      15     1 200  NA
# 8:      14      15     1 200  NA

Update: Step-by-step version of the above

To understand what's happening above, here's a step-by-step approach:

## The following would be a long data.table with 4 columns:
##   Team, V1, ind, and time
step1 <- as.data.table(d)[
  , combn(Person, 2), by = Team][
    , ind := paste0("Person", c(1, 2))][
      , time := sequence(.N), by = list(Team, ind)]
head(step1)
#    Team V1     ind time
# 1:  100 10 Person1    1
# 2:  100 11 Person2    1
# 3:  100 10 Person1    2
# 4:  100 12 Person2    2
# 5:  100 11 Person1    3
# 6:  100 12 Person2    3

## Here, we make the data "wide"
step2 <- dcast.data.table(step1, time + Team ~ ind, value.var = "V1")
step2
#    time Team Person1 Person2
# 1:    1  100      10      11
# 2:    1  200      10      11
# 3:    2  100      10      12
# 4:    2  200      10      14
# 5:    3  100      11      12
# 6:    3  200      10      15
# 7:    4  200      11      14
# 8:    5  200      11      15
# 9:    6  200      14      15

## Create a "count" column and a "time" column,
##   grouped by "Person1" and "Person2".
##   Count is for the count column.
##   Time is for going to a wide format
step3 <- step2[, c("count", "time") := list(.N, sequence(.N)), 
               by = list(Person1, Person2)]
step3
#    time Team Person1 Person2 count
# 1:    1  100      10      11     2
# 2:    2  200      10      11     2
# 3:    1  100      10      12     1
# 4:    1  200      10      14     1
# 5:    1  100      11      12     1
# 6:    1  200      10      15     1
# 7:    1  200      11      14     1
# 8:    1  200      11      15     1
# 9:    1  200      14      15     1

## The final step of going wide
out <- dcast.data.table(step3, Person1 + Person2 + count ~ time, 
                        value.var = "Team")
out
#    Person1 Person2 count   1   2
# 1:      10      11     2 100 200
# 2:      10      12     1 100  NA
# 3:      10      14     1 200  NA
# 4:      10      15     1 200  NA
# 5:      11      12     1 100  NA
# 6:      11      14     1 200  NA
# 7:      11      15     1 200  NA
# 8:      14      15     1 200  NA
like image 105
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 17 '22 23:10

A5C1D2H2I1M1N2O1R2T1


Following @Gregor and using Gregor's data, I tried to add team columns. I could not produce what you requested, but this may be useful. Using full_join in the dev version of dplyr (dplyr 0.4), I did the following. I created a data frame for each team with all combinations of Person using combn and saved the data as the object, a. Then, I separated a by team and used full_join. In this way, I tried to create team columns, at least for team 100 and 200. I used rename to change column names and select to order the columns in your way.

library(dplyr)

group_by(dd, Team) %>%
do(data.frame(t(combn(.$Person, 2)))) %>%
data.frame() ->a;
full_join(filter(a, Team == "100"), filter(a, Team == "200"), by = c("X1", "X2")) %>%
rename(Person1 = X1, Person2 = X2, Team1 = Team.x, Team2 = Team.y) %>%
select(Person1, Person2, Team1, Team2)

#  Person1 Person2 Team1 Team2
#1      10      11   100   200
#2      10      12   100    NA
#3      11      12   100    NA
#4      10      14    NA   200
#5      10      15    NA   200
#6      11      14    NA   200
#7      11      15    NA   200
#8      14      15    NA   200

EDIT

I am sure there are better ways of doing this. But, this is the closest I can do. I tried to add the count using another join in this version.

group_by(dd, Team) %>%
do(data.frame(t(combn(.$Person, 2)))) %>%
data.frame() ->a;
full_join(filter(a, Team == "100"), filter(a, Team == "200"), by = c("X1", "X2")) %>%
full_join(count(a, X1, X2), by = c("X1", "X2")) %>%
rename(Person1 = X1, Person2 = X2, Team1 = Team.x, Team2 = Team.y, Count = n) %>%
select(Person1, Person2, Count, Team1, Team2)

#  Person1 Person2 Count Team1 Team2
#1      10      11     2   100   200
#2      10      12     1   100    NA
#3      11      12     1   100    NA
#4      10      14     1    NA   200
#5      10      15     1    NA   200
#6      11      14     1    NA   200
#7      11      15     1    NA   200
#8      14      15     1    NA   200
like image 38
jazzurro Avatar answered Oct 17 '22 22:10

jazzurro