I'm trying to use sklearn to carry out Canonical Correlation Analysis (CCA). I'm starting with the simple example that is included in the manual:
from sklearn.cross_decomposition import CCA
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
cca = CCA(n_components=1)
cca.fit(X, Y)
X_c, Y_c = cca.transform(X, Y)
I understand that in cca.x_weights_ I get the "canonical coefficents", i.e., the linear combinations of the original X variables (the columns of matrices "A" and "B" returned by MATLAB). However, where are the the "canonical correlations", i.e, the maximum correlation reached when applying the transformation given by the canonical coeficients (i.e., vector "r" returned by MATLAB). Is it possible to also get that in Python?
In this article we introduce Pyrcca, an open-source Python package for performing canonical correlation analysis (CCA). CCA is a multivariate analysis method for identifying relationships between sets of variables.
Carry out a canonical correlation analysis using SAS (Minitab does not have this functionality); Assess how many canonical variate pairs should be considered; Interpret canonical variate scores; Describe the relationships between variables in the first set with variables in the second set.
Ordinary correlation between two multidimensional variables would give similarity between these variables, whereas canonical correlation analysis (CCA) would find two linear transforms to obtain maximum correlation between the projection of these transform.
A canonical correlation is a correlation between two canonical or latent types of variables. In canonical correlation, one variable is an independent variable and the other variable is a dependent variable.
You can calculate the correlations using the outputs of .transfrom
. This can be done with either numpy or scipy. I prefer scipy's stats module:
X_c, Y_c = cca.transform(X, Y)
import scipy.stats
corrcoef,p_value = scipy.stats.pearsonr(X_c,Y_c)
Clearly, since in your case you don't have enough samples (i.e., n < p+q), you're correlation is 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With