In scikit-learn for Python, there is a module call cross_decomposition with a canonical correlation analysis (CCA) class. I have been trying to figure out how to give the class 2 multidimensional vectors of shape (n,m) and get the first canonical correlation coefficient. Looking off the documentation, a little example script is as follows.
from sklearn.cross_decomposition import CCA
import numpy as np
U = np.random.random_sample(500).reshape(100,5)
V = np.random.random_sample(500).reshape(100,5)
cca = CCA(n_components=1)
cca.fit(U, V)
cca.coef_.shape # (5,5)
U_c, V_c = cca.transform(U, V)
U_c.shape # (100,1)
V_c.shape # (100,1)
I do not really understand how to use this class to get the first canonical correlation between two matrices, which is all that I need. It seems generally directed towards classification and prediction problems, but I just need the first canonical correlation coefficient and nothing else. I know there are a few other posts somewhat similar to this, but the question remains unanswered and the best suggestion is to change to MATLAB, which is a non-solution. Any help is appreciated.
Given your transformed matrices U_c
and V_c
, you can indeed retrieve canonical component correlations like you did, and more generally for a CCA with n_comp
CCs:
result = np.corrcoef(U_c.T, V_c.T).diagonal(offset=n_comp)
Now, you do not have to tranform
your data yourself, it has been done during the fitting procedure at least for the training data. The score are stored in the CCA
instance by scikit-learn, so:
score = np.diag(np.corrcoef(cca.x_scores_, cca.y_scores_, rowvar=False)[:n_comp, n_comp:])
Will give the same result, a vector of n_comp
scalar values, corresponding to the score, or correlations between each pair of canonical components.
Well, with some help looking at the source code in pyrcca I managed to create this snippet of code to get out the first canonical correlation.
cca = CCA(n_components=1)
U_c, V_c = cca.fit_transform(U, V)
result = np.corrcoef(U_c.T, V_c.T)[0,1]
Hope this helps someone else.
Note: The pyrcca package mentioned above runs slightly quicker than sci-kit learn's, due to heavier usage of multi-core processing for anyone who was curious. Also they have implemented kernel CCA unlike sklearn.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With