Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does calculating correlation between two dataframes require a loop?

I have a set of large dataframes that look like A and B:

A <- data.frame(A1=c(1,2,3,4,5),B1=c(6,7,8,9,10),C1=c(11,12,13,14,15 ))

  A1 B1 C1
1  1  6 11
2  2  7 12
3  3  8 13
4  4  9 14
5  5 10 15

B <- data.frame(A2=c(6,7,7,10,11),B2=c(2,1,3,8,11),C2=c(1,5,16,7,8))

  A2 B2 C2
1  6  2  1
2  7  1  5
3  7  3 16
4 10  8  7
5 11 11  8

I would like to create a vector (C) that denotes the Pearson correlation between A1 & A2, B1 & B2, and C1 & C2. In this case, for example, those correlations are:

[1] 0.95 0.92 0.46
like image 467
Joe Ripberger Avatar asked Aug 30 '13 14:08

Joe Ripberger


People also ask

How do you find the correlation between two data frames?

corrwith() is used to compute pairwise correlation between rows or columns of two DataFrame objects. If the shape of two dataframe object is not same then the corresponding correlation value will be a NaN value. Note: The correlation of a variable with itself is 1.

How to check correlation between two columns in Pandas?

Correlation Between Two Columns of DataFrame You can see the correlation between two columns of pandas DataFrame by using DataFrame. corr() function. The pandas. DataFrame.

How do you find the correlation between two sets of data in R?

The correlation between 2 variables is found with the cor() function. Note that the correlation between variables X and Y is equal to the correlation between variables Y and X so the order of the variables in the cor() function does not matter.


3 Answers

cor accepts two data.frames:

A<-data.frame(A1=c(1,2,3,4,5),B1=c(6,7,8,9,10),C1=c(11,12,13,14,15 ))

B<-data.frame(A2=c(6,7,7,10,11),B2=c(2,1,3,8,11),C2=c(1,5,16,7,8))

cor(A,B)

#           A2        B2       C2
# A1 0.9481224 0.9190183 0.459588
# B1 0.9481224 0.9190183 0.459588
# C1 0.9481224 0.9190183 0.459588

diag(cor(A,B))
#[1] 0.9481224 0.9190183 0.4595880

Edit:

Here are some benchmarks:

Unit: microseconds
                   expr     min       lq   median       uq      max neval
        diag(cor(A, B)) 230.292 238.4225 243.0115 255.0295  352.955   100
      mapply(cor, A, B) 267.076 281.5120 286.8030 299.5260  375.087   100
 unlist(Map(cor, A, B)) 250.053 259.1045 264.5635 275.9035 1146.140   100

Edit2:

And some better benchmarks using

set.seed(42)
A <- as.data.frame(matrix(rnorm(10*n),ncol=n))
B <- as.data.frame(matrix(rnorm(10*n),ncol=n))

enter image description here

However, I should probably mention that these benchmarks strongly depend on the number of rows.

Edit3: Since I was asked for the benchmarking code, here it is.

b <- sapply(2^(1:12), function(n) {
    set.seed(42)
    A <- as.data.frame(matrix(rnorm(10*n),ncol=n))
    B <- as.data.frame(matrix(rnorm(10*n),ncol=n))

    require(microbenchmark)
    res <- print(microbenchmark(
                   diag(cor(A,B)),
                   mapply(cor, A, B),
                   unlist(Map(cor,A,B)),
                   times=10
                 ),unit="us")
    res$median
})

b <- t(b)

matplot(x=1:12,log10(b),type="l",
        ylab="log10(median [µs])", 
        xlab="log2(n)",col=1:3,lty=1)
legend("topleft", legend=c("diag(cor(A, B))", 
                           "mapply(cor, A, B)",
                           "unlist(Map(cor,A,B))"),lty=1, col=1:3)
like image 53
Roland Avatar answered Oct 21 '22 19:10

Roland


You can use friend of apply functions, Map, for that.

Map(function(x,y) cor(x,y),A,B)
$A1
[1] 0.9481224

$B1
[1] 0.9190183

$C1
[1] 0.459588

If you want the output as vector as suggested by @Jilber :

unlist(Map(function(x,y) cor(x,y),A,B))
       A1        B1        C1 
0.9481224 0.9190183 0.4595880

Or you can just use:

 unlist(Map(cor,A,B))
       A1        B1        C1 
0.9481224 0.9190183 0.459588
like image 27
Metrics Avatar answered Oct 21 '22 18:10

Metrics


Another alternative you can use mapply function

> mapply(function(x,y) cor(x,y),A,B)
       A1        B1        C1 
0.9481224 0.9190183 0.4595880 

Or just mapply(cor, A, B) as suggested by @Aaron.

like image 21
Jilber Urbina Avatar answered Oct 21 '22 19:10

Jilber Urbina