I have three or more independent variables represented as R vectors, like so:
A <- c(1,2,3) B <- factor(c('x','y')) C <- c(0.1,0.5)
and I want to take the Cartesian product of all of them and put the result into a data frame, like this:
A B C 1 x 0.1 1 x 0.5 1 y 0.1 1 y 0.5 2 x 0.1 2 x 0.5 2 y 0.1 2 y 0.5 3 x 0.1 3 x 0.5 3 y 0.1 3 y 0.5
I can do this by manually writing out calls to rep
:
d <- data.frame(A = rep(A, times=length(B)*length(C)), B = rep(B, times=length(A), each=length(C)), C = rep(C, each=length(A)*length(B))
but there must be a more elegant way to do it, yes? product
in itertools
does part of the job, but I can't find any way to absorb the output of an iterator and put it into a data frame. Any suggestions?
p.s. The next step in this calculation looks like
d$D <- f(d$A, d$B, d$C)
so if you know a way to do both steps at once, that would also be helpful.
Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.
Practical Data Science using Python As we know if two lists are like (a, b) and (c, d) then the Cartesian product will be {(a, c), (a, d), (b, c), (b, d)}. To do this we shall use itertools library and use the product() function present in this library. The returned value of this function is an iterator.
In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter. # merge on that key. # on the key and drop it.
You can use expand.grid(A, B, C)
EDIT: an alternative to using do.call
to achieve the second part, is the function mdply
from the package plyr
:
library(plyr) d = expand.grid(x = A, y = B, z = C) d = mdply(d, f)
To illustrate its usage using a trivial function 'paste', you can try
d = mdply(d, 'paste', sep = '+');
There's a function manipulating dataframe, which is helpful in this case.
It can produce various join(in SQL terminology), while Cartesian product is a special case.
You have to convert the varibles to data frames first, because it take data frame as parameters.
so something like this will do:
A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL); A.B.C=merge(A.B, data.frame(C=C),by=NULL);
The only thing to care about is that rows are not sorted as you depicted. You may sort them manually as you wish.
merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)
"If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"
see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With