Partial Least Squares (PLS) algorithm is implemented in the scikit-learn library, as documented here: http://scikit-learn.org/0.12/auto_examples/plot_pls.html In the case where y is a binary vector, a variant of this algorithm is being used, the Partial least squares Discriminant Analysis (PLS-DA) algorithm. Does the PLSRegression module in sklearn.pls implements also this binary case? If not, where can I find a python implementation for it? In my binary case, I'm trying to use the PLSRegression: <pre class="prettyprint"><code>pls = PLSRegression(n_components=10) pls.fit(x, y) x_r, y_r = pls.transform(x, y, copy=True) </code></pre> In the transform function, the code gets exception in this line: <pre class="prettyprint"><code>y_scores = np.dot(Yc, self.y_rotations_) </code></pre> The error message is "ValueError: matrices are not aligned". Yc is the normalized y vector, and self.y_rotations_ = [1.]. In the fit function, self.y_rotations_ = np.ones(1) if the original y is a univariate vector (y.shape1=1).

PLS-DA is really a "trick" to use PLS for categorical outcomes instead of the usual continuous vector/matrix. The trick consists of creating a dummy identity matrix of zeros/ones which represents membership to each of the categories. So if you have a binary outcome to be predicted (i.e. male/female , yes/no, etc) your dummy matrix will have TWO columns representing the membership to either category. For example, consider the outcome gender for four people: 2 males and 2 females. The dummy matrix should be coded as : <pre class="prettyprint"><code>import numpy as np dummy=np.array([[1,1,0,0],[0,0,1,1]]).T </code></pre> , where each column represents the membership to the two categories (male, female) Then your model for data in variable Xdata ( shape 4 rows,arbitrary columns ) would be: <pre class="prettyprint"><code>myplsda=PLSRegression().fit(X=Xdata,Y=dummy) </code></pre> The predicted categories can be extracted from comparison of the two indicator variables in mypred: <pre class="prettyprint"><code>mypred= myplsda.predict(Xdata) </code></pre> For each row/case the predicted gender is that with the highest predicted membership.

You can use the Linear Discriminate Analysis package in SKLearn, it will take integers for the y value: LDA-SKLearn Here is a short tutorial on how to use the LDA: sklearn LDA tutorial

PLS-DA algorithm in python

Tags:

python

scikit-learn

Partial Least Squares (PLS) algorithm is implemented in the scikit-learn library, as documented here: http://scikit-learn.org/0.12/auto_examples/plot_pls.html In the case where y is a binary vector, a variant of this algorithm is being used, the Partial least squares Discriminant Analysis (PLS-DA) algorithm. Does the PLSRegression module in sklearn.pls implements also this binary case? If not, where can I find a python implementation for it? In my binary case, I'm trying to use the PLSRegression:

pls = PLSRegression(n_components=10)
pls.fit(x, y)
x_r, y_r = pls.transform(x, y, copy=True)

In the transform function, the code gets exception in this line:

y_scores = np.dot(Yc, self.y_rotations_)

The error message is "ValueError: matrices are not aligned". Yc is the normalized y vector, and self.y_rotations_ = [1.]. In the fit function, self.y_rotations_ = np.ones(1) if the original y is a univariate vector (y.shape1=1).

471

asked Aug 22 '13 20:08

Noam Peled

2 Answers

PLS-DA is really a "trick" to use PLS for categorical outcomes instead of the usual continuous vector/matrix. The trick consists of creating a dummy identity matrix of zeros/ones which represents membership to each of the categories. So if you have a binary outcome to be predicted (i.e. male/female , yes/no, etc) your dummy matrix will have TWO columns representing the membership to either category.

For example, consider the outcome gender for four people: 2 males and 2 females. The dummy matrix should be coded as :

import numpy as np
dummy=np.array([[1,1,0,0],[0,0,1,1]]).T

, where each column represents the membership to the two categories (male, female)

Then your model for data in variable Xdata ( shape 4 rows,arbitrary columns ) would be:

myplsda=PLSRegression().fit(X=Xdata,Y=dummy)

The predicted categories can be extracted from comparison of the two indicator variables in mypred:

mypred= myplsda.predict(Xdata)

For each row/case the predicted gender is that with the highest predicted membership.

answered Sep 20 '22 14:09

markcelo

You can use the Linear Discriminate Analysis package in SKLearn, it will take integers for the y value:

LDA-SKLearn

Here is a short tutorial on how to use the LDA: sklearn LDA tutorial

answered Sep 19 '22 14:09

Kyle54

Related questions
                            
                                tokenize a string keeping delimiters in Python
                            
                                How to plot data against specific dates on the x-axis using matplotlib
                            
                                Can I use cStringIO the same as StringIO?
                            
                                Element-wise power of scipy.sparse matrix
                            
                                Should I create pipeline to save files with scrapy?
                            
                                Unique combination of fields in SQLite?
                            
                                How to fix pylint warning "Abstract class not referenced"?
                            
                                How redirect a shell command output to a Python script input ?
                            
                                Docstrings - one line vs multiple line
                            
                                pylab matplotlib "show" waits until window closes
                            
                                How to correct "TypeError: 'NoneType' object is not subscriptable" in recursive function?
                            
                                Python leading underscore _variables
                            
                                A safe, atomic file-copy operation
                            
                                Python 3 - urllib, HTTP Error 407: Proxy Authentication Required
                            
                                Whats the best way to write python code into a python file?
                            
                                IPython Maintain namespace after run
                            
                                Python ctypes: how to free memory? Getting invalid pointer error
                            
                                How can I render 3D histograms in python?
                            
                                Can a Python package name start with a number?
                            
                                Don't show long options twice in print_help() from argparse

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With