Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply over two data frames

Tags:

r

apply

I'm using R, and I have two data.frames, A and B. They both have 6 rows, but A has 25000 columns (genes), and B has 30 columns. I'd like to apply a function with two arguments f(x,y) where x is every column of A and y is every column of B. So far it looks like this:

i = 1
for (x in A){
    j = 1
    for (y in B){
        out[i,j] <- f(x,y)
        j = j + 1
    }
    i = i + 1
}

I have two issues with this: from my Python programming I associate keeping track of counters like this as crufty, and from my R programming I am nervous of for loops. However, I can't quite see how to apply apply (or even if I should apply apply) to this problem and was hoping someone might enlighten me. I need to treat f() as atomic (it's actually cor.test()) for now.

like image 793
Mike Dewar Avatar asked Aug 24 '10 14:08

Mike Dewar


2 Answers

Since you are using data frames, it might be faster to use lapply or sapply to do this (specially given the scope of your data frames). For example,

x <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), col3=c(9,10,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
bl <- lapply(x, function(u){
   lapply(y, function(v){
       f(u,v) # Function with column from x and column from y as inputs
   })
})
out = matrix(unlist(bl), ncol=ncol(y), byrow=T)
like image 190
Abhijit Avatar answered Sep 23 '22 12:09

Abhijit


Some data

nrows <- 6
A <- data.frame(a = runif(nrows), b = runif(nrows), c = runif(nrows))
B <- data.frame(z = rnorm(nrows), y = rnorm(nrows))

The trick: remember columns with expand.grid

counter <- expand.grid(seq_along(A), seq_along(B))
f <- function(x) 
{
  cor.test(A[, x["Var1"]], B[, x["Var2"]])$estimate
}

Now we only need 1 call to apply.

stats <- apply(counter, 1, f)
names(stats) <- paste(names(A)[counter$Var1], names(B)[counter$Var2], sep = ",")
stats
like image 39
Richie Cotton Avatar answered Sep 26 '22 12:09

Richie Cotton