My current data looks like this:
Person Team
10 100
11 100
12 100
10 200
11 200
14 200
15 200
I want to infer who knew one another, based on what teams they were on together. I also want a count of how many times a dyad was on a team together, and I want to keep track of the team identification codes that link each pair of people. In other words, I want to create a data set that looks like this:
Person1 Person2 Count Team1 Team2 Team3
10 11 2 100 200 NA
10 12 1 100 NA NA
11 12 1 100 NA NA
10 14 1 200 NA NA
10 15 1 200 NA NA
11 14 1 200 NA NA
11 15 1 200 NA NA
The resulting data set captures the relationships that can be inferred based on the teams that were outlined in the original data set. The "Count" variable reflects the number of instances that a pair of people was on a team together. The "Team1", "Team2", and "Team3" variables list the team ID(s) that link each pair of people to one another. It doesn't make a difference which person/team ID is listed first versus second. Teams range in size from 2 members to 8 members.
Here's a "data.table" solution that seems to get to where you want to get (albeit with quite a mouthful of code):
library(data.table)
dcast.data.table(
dcast.data.table(
as.data.table(d)[, combn(Person, 2), by = Team][
, ind := paste0("Person", c(1, 2))][
, time := sequence(.N), by = list(Team, ind)],
time + Team ~ ind, value.var = "V1")[
, c("count", "time") := list(.N, sequence(.N)), by = list(Person1, Person2)],
Person1 + Person2 + count ~ time, value.var = "Team")
# Person1 Person2 count 1 2
# 1: 10 11 2 100 200
# 2: 10 12 1 100 NA
# 3: 10 14 1 200 NA
# 4: 10 15 1 200 NA
# 5: 11 12 1 100 NA
# 6: 11 14 1 200 NA
# 7: 11 15 1 200 NA
# 8: 14 15 1 200 NA
To understand what's happening above, here's a step-by-step approach:
## The following would be a long data.table with 4 columns:
## Team, V1, ind, and time
step1 <- as.data.table(d)[
, combn(Person, 2), by = Team][
, ind := paste0("Person", c(1, 2))][
, time := sequence(.N), by = list(Team, ind)]
head(step1)
# Team V1 ind time
# 1: 100 10 Person1 1
# 2: 100 11 Person2 1
# 3: 100 10 Person1 2
# 4: 100 12 Person2 2
# 5: 100 11 Person1 3
# 6: 100 12 Person2 3
## Here, we make the data "wide"
step2 <- dcast.data.table(step1, time + Team ~ ind, value.var = "V1")
step2
# time Team Person1 Person2
# 1: 1 100 10 11
# 2: 1 200 10 11
# 3: 2 100 10 12
# 4: 2 200 10 14
# 5: 3 100 11 12
# 6: 3 200 10 15
# 7: 4 200 11 14
# 8: 5 200 11 15
# 9: 6 200 14 15
## Create a "count" column and a "time" column,
## grouped by "Person1" and "Person2".
## Count is for the count column.
## Time is for going to a wide format
step3 <- step2[, c("count", "time") := list(.N, sequence(.N)),
by = list(Person1, Person2)]
step3
# time Team Person1 Person2 count
# 1: 1 100 10 11 2
# 2: 2 200 10 11 2
# 3: 1 100 10 12 1
# 4: 1 200 10 14 1
# 5: 1 100 11 12 1
# 6: 1 200 10 15 1
# 7: 1 200 11 14 1
# 8: 1 200 11 15 1
# 9: 1 200 14 15 1
## The final step of going wide
out <- dcast.data.table(step3, Person1 + Person2 + count ~ time,
value.var = "Team")
out
# Person1 Person2 count 1 2
# 1: 10 11 2 100 200
# 2: 10 12 1 100 NA
# 3: 10 14 1 200 NA
# 4: 10 15 1 200 NA
# 5: 11 12 1 100 NA
# 6: 11 14 1 200 NA
# 7: 11 15 1 200 NA
# 8: 14 15 1 200 NA
Following @Gregor and using Gregor's data, I tried to add team columns. I could not produce what you requested, but this may be useful. Using full_join
in the dev version of dplyr
(dplyr 0.4), I did the following. I created a data frame for each team with all combinations of Person using combn
and saved the data as the object, a
. Then, I separated a
by team and used full_join
. In this way, I tried to create team columns, at least for team 100
and 200
. I used rename
to change column names and select
to order the columns in your way.
library(dplyr)
group_by(dd, Team) %>%
do(data.frame(t(combn(.$Person, 2)))) %>%
data.frame() ->a;
full_join(filter(a, Team == "100"), filter(a, Team == "200"), by = c("X1", "X2")) %>%
rename(Person1 = X1, Person2 = X2, Team1 = Team.x, Team2 = Team.y) %>%
select(Person1, Person2, Team1, Team2)
# Person1 Person2 Team1 Team2
#1 10 11 100 200
#2 10 12 100 NA
#3 11 12 100 NA
#4 10 14 NA 200
#5 10 15 NA 200
#6 11 14 NA 200
#7 11 15 NA 200
#8 14 15 NA 200
EDIT
I am sure there are better ways of doing this. But, this is the closest I can do. I tried to add the count using another join in this version.
group_by(dd, Team) %>%
do(data.frame(t(combn(.$Person, 2)))) %>%
data.frame() ->a;
full_join(filter(a, Team == "100"), filter(a, Team == "200"), by = c("X1", "X2")) %>%
full_join(count(a, X1, X2), by = c("X1", "X2")) %>%
rename(Person1 = X1, Person2 = X2, Team1 = Team.x, Team2 = Team.y, Count = n) %>%
select(Person1, Person2, Count, Team1, Team2)
# Person1 Person2 Count Team1 Team2
#1 10 11 2 100 200
#2 10 12 1 100 NA
#3 11 12 1 100 NA
#4 10 14 1 NA 200
#5 10 15 1 NA 200
#6 11 14 1 NA 200
#7 11 15 1 NA 200
#8 14 15 1 NA 200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With