Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate all possible combinations of rows in R?

Tags:

r

Let's say I have two dataframes, students and teachers.

students <- data.frame(name = c("John", "Mary", "Sue", "Mark", "Gordy", "Joey", "Marge", "Sheev", "Lisa"),
                   height = c(111, 93, 99, 107, 100, 123, 104, 80, 95),
                   smart = c("no", "no", "yes", "no", "yes", "yes", "no", "yes", "no"))
teachers <- data.frame(name = c("Ben", "Craig", "Mindy"),
                   height = c(130, 101, 105),
                   smart = c("yes", "yes", "yes"))

I want to generate all possible combinations of students and teachers and keep the accompanying info, basically create all combinations of rows from dataframe "students" and "teachers". This can easily be done with a loop and cbind, but for a massive dataframe, this takes forever. Help an R newbie out -- what's the fastest way to do this?

Edit: If this isn't clear, I want the output to have the following format:

rbind(
  cbind(students[1, ], teachers[1, ]), 
  cbind(students[1, ], teachers[2, ]) 
  ...
  cbind(students[n, ], teachers[n, ]))
like image 639
mowglis_diaper Avatar asked Aug 04 '17 03:08

mowglis_diaper


2 Answers

You can combine all the data as below:

do.call(cbind.data.frame,Map(expand.grid,teacher=teachers,students=students))

   name.teacher name.students height.teacher height.students smart.teacher smart.students
1           Ben          John            130             111           yes             no
2         Craig          John            101             111           yes             no
3         Mindy          John            105             111           yes             no
4           Ben          Mary            130              93           yes             no
5         Craig          Mary            101              93           yes             no
6         Mindy          Mary            105              93           yes             no
:            :            :                :               :            :              :
:            :            :                :               :            :              :
like image 177
KU99 Avatar answered Nov 13 '22 08:11

KU99


and keep the accompanying info

I would recommend not doing this. There is no need to have everything in a single object.

To just combine the teachers and students, there's

res = expand.grid(teacher_name = teachers$name, student_name = students$name)

To merge in the other data (which I would recommend not doing until necessary):

res[, paste("teacher", c("height", "smart"), sep="_")] <- 
  teachers[match(res$teacher_name, teachers$name), c("height","smart")]

res[, paste("student", c("height", "smart"), sep="_")] <- 
  students[match(res$student_name, students$name), c("height","smart")]

This gives

head(res)

  teacher_name student_name teacher_height teacher_smart student_height student_smart
1          Ben         John            130           yes            111            no
2        Craig         John            101           yes            111            no
3        Mindy         John            105           yes            111            no
4          Ben         Mary            130           yes             93            no
5        Craig         Mary            101           yes             93            no
6        Mindy         Mary            105           yes             93            no
like image 28
Frank Avatar answered Nov 13 '22 10:11

Frank