Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a random matching between the rows of two data.tables (or data.frames)

Tags:

r

data.table

For this example, I'll use the data.table package.

Suppose you have a table of coaches

coaches <- data.table(CoachID=c(1,2,3), CoachName=c("Bob","Sue","John"), NumPlayers=c(2,3,0))
coaches
   CoachID CoachName NumPlayers
1:       1       Bob          2
2:       2       Sue          3
3:       3      John          0

and a table of players

players <- data.table(PlayerID=c(1,2,3,4,5,6), PlayerName=c("Abe","Bart","Chad","Dalton","Egor","Frank"))
players
   PlayerID PlayerName
1:        1        Abe
2:        2       Bart
3:        3       Chad
4:        4     Dalton
5:        5       Egor
6:        6      Frank

You want to match each coach with a set of players such that

  • The number of players tied to each coach is defined by the NumPlayers field
  • No two coaches are tied to the same player
  • Players and coaches are matched randomly

How do you this?

exampleResult <- data.table(CoachID=c(1,1,2,2,2,3), PlayerID=c(3,1,2,5,6,NA))
exampleResult

   CoachID PlayerID
1:       1        3
2:       1        1
3:       2        2
4:       2        5
5:       2        6
6:       3       NA
like image 622
Ben Avatar asked May 06 '15 20:05

Ben


People also ask

How do I add two data frames in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package.

How do I stack data frames in R?

Method 1: Using stack method The cbind() operation is used to stack the columns of the data frame together. Initially, the first two columns of the data frame are combined together using the df[1:2]. This is followed by the application of stack() method applied on the last two columns.


2 Answers

You could sample without replacement from the player IDs, grabbing the total number of players you need:

set.seed(144)
(selections <- sample(players$PlayerID, sum(coaches$NumPlayers)))
# [1] 1 4 3 2 6

Each player will have equal probability of being included in selections, and the ordering of that vector is random. Therefore you can just assign these players to each coaching slot:

data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
           PlayerID=selections)
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6

If you wanted to have an NA value for any coaches with no player selections, you could do something like:

rbind(data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
                 PlayerID=selections),
      data.frame(CoachID=coaches$CoachID[coaches$NumPlayers==0],
                 PlayerID=rep(NA, sum(coaches$NumPlayers==0))))
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6
# 6       3       NA
like image 110
josliber Avatar answered Oct 27 '22 10:10

josliber


Get demand and supply on each side, so to speak:

demand <- with(coaches,rep(CoachID,NumPlayers))
supply <- players$PlayerID

Then I'd do...

randmatch <- function(demand,supply){
  n_demand  <- length(demand)
  n_supply  <- length(supply)
  n_matches <- min(n_demand,n_supply)

  if (n_demand >= n_supply) 
    data.frame(d=sample(demand,n_matches),s=supply)
  else 
    data.frame(d=demand,s=sample(supply,n_matches))
}

Examples:

set.seed(1)
randmatch(demand,supply)    # some players unmatched, OP's example
randmatch(rep(1:3,1:3),1:4) # some coaches unmatched 

I'm not sure if this is a case the OP wanted to cover, though.


For the OP's desired output...

m <- randmatch(demand,supply)
merge(m,coaches,by.x="d",by.y="CoachID",all=TRUE)
#   d  s CoachName NumPlayers
# 1 1  2       Bob          2
# 2 1  6       Bob          2
# 3 2  3       Sue          3
# 4 2  4       Sue          3
# 5 2  1       Sue          3
# 6 3 NA      John          0

Similarly...

merge(m,players,by.x="s",by.y="PlayerID",all=TRUE)
#   s  d PlayerName
# 1 1  2        Abe
# 2 2  1       Bart
# 3 3  2       Chad
# 4 4  2     Dalton
# 5 5 NA       Egor
# 6 6  1      Frank
like image 20
Frank Avatar answered Oct 27 '22 09:10

Frank