I typically get PCA
loadings like this:
pca = PCA(n_components=2) X_t = pca.fit(X).transform(X) loadings = pca.components_
If I run PCA
using a scikit-learn pipeline:
from sklearn.pipeline import Pipeline pipeline = Pipeline(steps=[ ('scaling',StandardScaler()), ('pca',PCA(n_components=2)) ]) X_t=pipeline.fit_transform(X)
is it possible to get the loadings?
Simply trying loadings = pipeline.components_
fails:
AttributeError: 'Pipeline' object has no attribute 'components_'
(Also interested in extracting attributes like coef_
from pipelines.)
They have several key benefits: They make your workflow much easier to read and understand. They enforce the implementation and order of steps in your project. These in turn make your work much more reproducible.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a '__' , as in the example below.
What are Regression pipelines? Regression pipelines allow you to predict the value of some numeric attribute for each of your users. If you know the value of that attribute for a subset of users, you can use a Regression pipeline to leverage that information into broad insights about your entire set of users.
Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear.
Update: in 0.21 you can use just square brackets:
pipeline['pca']
or indices
pipeline[1]
There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:
pipeline.named_steps['pca'] pipeline.steps[1][1]
This will give you the PCA object, on which you can get components. With named_steps
you can also use attribute access with a .
which allows autocompletion:
pipeline.names_steps.pca.<tab here gives autocomplete>
Working with pipelines is simpler using Neuraxle. For instance, you can do this:
from neuraxle.pipeline import Pipeline # Create and fit the pipeline: pipeline = Pipeline([ StandardScaler(), PCA(n_components=2) ]) pipeline, X_t = pipeline.fit_transform(X) # Get the components: pca = pipeline[-1] components = pca.components_
You can access your PCA these three different ways as wished:
pipeline['PCA']
pipeline[-1]
pipeline[1]
Neuraxle is a pipelining library built on top of scikit-learn to take pipelines to the next level. It allows easily managing spaces of hyperparameter distributions, nested pipelines, saving and reloading, REST API serving, and more. The whole thing is made to also use Deep Learning algorithms and to allow parallel computing.
You could have pipelines within pipelines as below.
# Create and fit the pipeline: pipeline = Pipeline([ StandardScaler(), Identity(), Pipeline([ Identity(), # Note: an Identity step is a step that does nothing. Identity(), # We use it here for demonstration purposes. Identity(), Pipeline([ Identity(), PCA(n_components=2) ]) ]) ]) pipeline, X_t = pipeline.fit_transform(X)
Then you'd need to do this:
# Get the components: pca = pipeline["Pipeline"]["Pipeline"][-1] components = pca.components_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With