Partial Least Squares (PLS) algorithm is implemented in the scikit-learn library, as documented here: http://scikit-learn.org/0.12/auto_examples/plot_pls.html In the case where y is a binary vector, a variant of this algorithm is being used, the Partial least squares Discriminant Analysis (PLS-DA) algorithm. Does the PLSRegression module in sklearn.pls implements also this binary case? If not, where can I find a python implementation for it? In my binary case, I'm trying to use the PLSRegression:
pls = PLSRegression(n_components=10)
pls.fit(x, y)
x_r, y_r = pls.transform(x, y, copy=True)
In the transform function, the code gets exception in this line:
y_scores = np.dot(Yc, self.y_rotations_)
The error message is "ValueError: matrices are not aligned". Yc is the normalized y vector, and self.y_rotations_ = [1.]. In the fit function, self.y_rotations_ = np.ones(1) if the original y is a univariate vector (y.shape1=1).
Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection.
Mathematical operations of PLS regression and PLS-DA are nominally the same, with the major difference being the response that is predicted. In chromatographic applications, PLS-DA aims to predict sample class membership contained in matrix Y based on chromatographic data contained in matrix X.
Background: Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier.
PLS-DA is really a "trick" to use PLS for categorical outcomes instead of the usual continuous vector/matrix. The trick consists of creating a dummy identity matrix of zeros/ones which represents membership to each of the categories. So if you have a binary outcome to be predicted (i.e. male/female , yes/no, etc) your dummy matrix will have TWO columns representing the membership to either category.
For example, consider the outcome gender for four people: 2 males and 2 females. The dummy matrix should be coded as :
import numpy as np
dummy=np.array([[1,1,0,0],[0,0,1,1]]).T
, where each column represents the membership to the two categories (male, female)
Then your model for data in variable Xdata ( shape 4 rows,arbitrary columns ) would be:
myplsda=PLSRegression().fit(X=Xdata,Y=dummy)
The predicted categories can be extracted from comparison of the two indicator variables in mypred:
mypred= myplsda.predict(Xdata)
For each row/case the predicted gender is that with the highest predicted membership.
You can use the Linear Discriminate Analysis package in SKLearn, it will take integers for the y value:
LDA-SKLearn
Here is a short tutorial on how to use the LDA: sklearn LDA tutorial
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With