Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a matrix calculation to get the cross products of variables

I have some data in R which has quite a few columns. Please use the below as an example

x = replicate(5, rnorm(10)) 
colnames(x) = c('a','b','c','d','e')

I want to calculate the cross products and ratios of every combination and .append them to the end of the table. I also want to name them so they relate to what they are calculated with

The result should have exta columns like:

cp_a_b,
cp_a_c,
cp_a_d,
cp_a_e,
cp_b_c,
cp_b_d,
cp_b_e,
cp_c_d,
cp_c_e,
cp_d_e,
ratio_a_b,
ratio_a_c,
ratio_a_d,
ratio_a_e,
ratio_b_c,
ratio_b_d,
ratio_b_e,
ratio_c_d,
ratio_c_e,
ratio_d_e,

where cp is cross product and ratio is the ratio of the two columns I want to do this as a matrix calculation so it is quick rather than a loop

like image 446
shecode Avatar asked Sep 07 '15 22:09

shecode


People also ask

How do you calculate cross product value?

Cross product of two vectors is equal to the product of their magnitude, which represents the area of a rectangle with sides X and Y. If two vectors are perpendicular to each other, then the cross product formula becomes:θ = 90 degreesWe know that, sin 90° = 1.


1 Answers

I'm still new at R, but here's a stab at it anyway. For fun! I have no idea if there's any hope for it to be fast. Probably it's quite naive...

First an example matrix x of num_observations x num_features of small random integers.

num_features <- 5
num_observations <- 20
features <- letters[1:num_features]

x <- replicate(num_features, sample(1:10, num_observations, replace = T))

colnames(x) <- features

All combinations of feature pairs:

combinations <- combn(features, 2)
num_combinations = ncol(combinations)

For each feature pair, we'll multiply the corresponding columns in x. Reserving space for a new matrix where the multiplied columns will end up:

y <- matrix(NA, ncol = num_combinations, nrow = num_observations)
cn <- rep("?", num_combinations) # column names of new features

Multiplying the column combinations:

for (i in 1:num_combinations)
{
  cn[i] <- paste(combinations[1,i], combinations[2,i], sep = ".")
  y[,i] <- x[,combinations[1,i]] * x[,combinations[2,i]]
}
colnames(y) <- cn

Finally merging the original matrix and the additional features:

x <- cbind(x, y)

This only handles multiplication for simplicity, additional features created using division is of course similar.

UPDATE

A nice approach suggested by @nongkrong in the comments forgoes the explicit loop and simply does:

y <- combn(split(x, col(x)), 2, FUN = function(cols) cols[[1]] * cols[[2]])
x <- cbind(x, y)

It doesn't explicitly set the column names of the new features, but it is more elegant and more readable. In some quick timings I did it was also about 30% faster!

like image 166
WhiteViking Avatar answered Sep 30 '22 05:09

WhiteViking