Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canonical Correlation Analysis in Python with sklearn

I'm trying to use sklearn to carry out Canonical Correlation Analysis (CCA). I'm starting with the simple example that is included in the manual:

from sklearn.cross_decomposition import CCA
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
cca = CCA(n_components=1)
cca.fit(X, Y)

X_c, Y_c = cca.transform(X, Y)

I understand that in cca.x_weights_ I get the "canonical coefficents", i.e., the linear combinations of the original X variables (the columns of matrices "A" and "B" returned by MATLAB). However, where are the the "canonical correlations", i.e, the maximum correlation reached when applying the transformation given by the canonical coeficients (i.e., vector "r" returned by MATLAB). Is it possible to also get that in Python?

like image 451
manu Avatar asked Oct 10 '14 11:10

manu


People also ask

What is CCA in Python?

In this article we introduce Pyrcca, an open-source Python package for performing canonical correlation analysis (CCA). CCA is a multivariate analysis method for identifying relationships between sets of variables.

How do you do a canonical correlation analysis?

Carry out a canonical correlation analysis using SAS (Minitab does not have this functionality); Assess how many canonical variate pairs should be considered; Interpret canonical variate scores; Describe the relationships between variables in the first set with variables in the second set.

How is canonical correlation analysis different from simple correlation analysis?

Ordinary correlation between two multidimensional variables would give similarity between these variables, whereas canonical correlation analysis (CCA) would find two linear transforms to obtain maximum correlation between the projection of these transform.

What do you understand by canonical correlation?

A canonical correlation is a correlation between two canonical or latent types of variables. In canonical correlation, one variable is an independent variable and the other variable is a dependent variable.


1 Answers

You can calculate the correlations using the outputs of .transfrom. This can be done with either numpy or scipy. I prefer scipy's stats module:

X_c, Y_c = cca.transform(X, Y)
import scipy.stats
corrcoef,p_value = scipy.stats.pearsonr(X_c,Y_c)

Clearly, since in your case you don't have enough samples (i.e., n < p+q), you're correlation is 1.

like image 119
idnavid Avatar answered Sep 18 '22 14:09

idnavid