How to create a random matching between the rows of two data.tables (or data.frames)

Tags:

r

data.table

For this example, I'll use the data.table package.

Suppose you have a table of coaches

coaches <- data.table(CoachID=c(1,2,3), CoachName=c("Bob","Sue","John"), NumPlayers=c(2,3,0))
coaches
   CoachID CoachName NumPlayers
1:       1       Bob          2
2:       2       Sue          3
3:       3      John          0

and a table of players

players <- data.table(PlayerID=c(1,2,3,4,5,6), PlayerName=c("Abe","Bart","Chad","Dalton","Egor","Frank"))
players
   PlayerID PlayerName
1:        1        Abe
2:        2       Bart
3:        3       Chad
4:        4     Dalton
5:        5       Egor
6:        6      Frank

You want to match each coach with a set of players such that

The number of players tied to each coach is defined by the NumPlayers field
No two coaches are tied to the same player
Players and coaches are matched randomly

How do you this?

exampleResult <- data.table(CoachID=c(1,1,2,2,2,3), PlayerID=c(3,1,2,5,6,NA))
exampleResult

   CoachID PlayerID
1:       1        3
2:       1        1
3:       2        2
4:       2        5
5:       2        6
6:       3       NA

622

asked May 06 '15 20:05

Ben

2 Answers

You could sample without replacement from the player IDs, grabbing the total number of players you need:

set.seed(144)
(selections <- sample(players$PlayerID, sum(coaches$NumPlayers)))
# [1] 1 4 3 2 6

Each player will have equal probability of being included in selections, and the ordering of that vector is random. Therefore you can just assign these players to each coaching slot:

data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
           PlayerID=selections)
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6

If you wanted to have an NA value for any coaches with no player selections, you could do something like:

rbind(data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
                 PlayerID=selections),
      data.frame(CoachID=coaches$CoachID[coaches$NumPlayers==0],
                 PlayerID=rep(NA, sum(coaches$NumPlayers==0))))
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6
# 6       3       NA

110

answered Oct 27 '22 10:10

josliber

Get demand and supply on each side, so to speak:

demand <- with(coaches,rep(CoachID,NumPlayers))
supply <- players$PlayerID

Then I'd do...

randmatch <- function(demand,supply){
  n_demand  <- length(demand)
  n_supply  <- length(supply)
  n_matches <- min(n_demand,n_supply)

  if (n_demand >= n_supply) 
    data.frame(d=sample(demand,n_matches),s=supply)
  else 
    data.frame(d=demand,s=sample(supply,n_matches))
}

Examples:

set.seed(1)
randmatch(demand,supply)    # some players unmatched, OP's example
randmatch(rep(1:3,1:3),1:4) # some coaches unmatched

I'm not sure if this is a case the OP wanted to cover, though.

For the OP's desired output...

m <- randmatch(demand,supply)
merge(m,coaches,by.x="d",by.y="CoachID",all=TRUE)
#   d  s CoachName NumPlayers
# 1 1  2       Bob          2
# 2 1  6       Bob          2
# 3 2  3       Sue          3
# 4 2  4       Sue          3
# 5 2  1       Sue          3
# 6 3 NA      John          0

Similarly...

merge(m,players,by.x="s",by.y="PlayerID",all=TRUE)
#   s  d PlayerName
# 1 1  2        Abe
# 2 2  1       Bart
# 3 3  2       Chad
# 4 4  2     Dalton
# 5 5 NA       Egor
# 6 6  1      Frank

answered Oct 27 '22 09:10

Frank

Related questions
                            
                                In ggplot2, how can I change the border of selected facets?
                            
                                Interactive / Reactive change of min / max values of sliderInput
                            
                                R caret nnet package in Multicore
                            
                                Use of randomforest() for classification in R?
                            
                                Shiny runExample Error - Fail to create server
                            
                                How do I write a csv file in R, where my input is written to the file as row?
                            
                                Best way to use c++ code from R package FOO in package BAR
                            
                                How to justify text axis labels in R ggplot
                            
                                Accessing grouped data in dplyr
                            
                                update() a model inside a function with local covariate
                            
                                R count NA by group
                            
                                Arrange multiple (32) .png files in a grid
                            
                                Extract all maximum length values in a character vector in R
                            
                                R - how to make barplot plot zeros for missing values over the data range?
                            
                                Have nomatch return value as-is using match function in R
                            
                                R: Converting from string to double
                            
                                data.table subsetting rows using a logical column: why do I have to explicitly compare with TRUE? [duplicate]
                            
                                How to insert values from a vector diagonally into a matrix in R?
                            
                                Time Series Breakout/Change/Disturbance Detection in R: strucchange, changepoint, BreakoutDetection, bfast, and more
                            
                                What are "reverse dependencies" in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With