Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cartesian product data frame

Tags:

dataframe

r

I have three or more independent variables represented as R vectors, like so:

A <- c(1,2,3) B <- factor(c('x','y')) C <- c(0.1,0.5) 

and I want to take the Cartesian product of all of them and put the result into a data frame, like this:

A B C 1 x 0.1 1 x 0.5 1 y 0.1 1 y 0.5 2 x 0.1 2 x 0.5 2 y 0.1 2 y 0.5 3 x 0.1 3 x 0.5 3 y 0.1 3 y 0.5 

I can do this by manually writing out calls to rep:

d <- data.frame(A = rep(A, times=length(B)*length(C)),                 B = rep(B, times=length(A), each=length(C)),                 C = rep(C, each=length(A)*length(B)) 

but there must be a more elegant way to do it, yes? product in itertools does part of the job, but I can't find any way to absorb the output of an iterator and put it into a data frame. Any suggestions?

p.s. The next step in this calculation looks like

d$D <- f(d$A, d$B, d$C) 

so if you know a way to do both steps at once, that would also be helpful.

like image 294
zwol Avatar asked Nov 29 '10 23:11

zwol


People also ask

How do you create a cartesian product in Pandas?

Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.

How do you use cartesian product in Python?

Practical Data Science using Python As we know if two lists are like (a, b) and (c, d) then the Cartesian product will be {(a, c), (a, d), (b, c), (b, d)}. To do this we shall use itertools library and use the product() function present in this library. The returned value of this function is an iterator.

How do I cross join two Pandas DataFrames?

In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter. # merge on that key. # on the key and drop it.


2 Answers

You can use expand.grid(A, B, C)


EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)  d = expand.grid(x = A, y = B, z = C) d = mdply(d, f) 

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+'); 
like image 81
Ramnath Avatar answered Sep 21 '22 01:09

Ramnath


There's a function manipulating dataframe, which is helpful in this case.

It can produce various join(in SQL terminology), while Cartesian product is a special case.

You have to convert the varibles to data frames first, because it take data frame as parameters.

so something like this will do:

A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL); A.B.C=merge(A.B, data.frame(C=C),by=NULL); 

The only thing to care about is that rows are not sorted as you depicted. You may sort them manually as you wish.

merge(x, y, by = intersect(names(x), names(y)),       by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,       sort = TRUE, suffixes = c(".x",".y"),       incomparables = NULL, ...) 

"If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"

see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html

like image 28
misssprite Avatar answered Sep 21 '22 01:09

misssprite