Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting model attributes from pipeline

I typically get PCA loadings like this:

pca = PCA(n_components=2) X_t = pca.fit(X).transform(X) loadings = pca.components_ 

If I run PCA using a scikit-learn pipeline:

from sklearn.pipeline import Pipeline pipeline = Pipeline(steps=[     ('scaling',StandardScaler()), ('pca',PCA(n_components=2)) ]) X_t=pipeline.fit_transform(X) 

is it possible to get the loadings?

Simply trying loadings = pipeline.components_ fails:

AttributeError: 'Pipeline' object has no attribute 'components_' 

(Also interested in extracting attributes like coef_ from pipelines.)

like image 538
lmart999 Avatar asked Mar 03 '15 01:03

lmart999


People also ask

What are two advantages of using Sklearn pipelines?

They have several key benefits: They make your workflow much easier to read and understand. They enforce the implementation and order of steps in your project. These in turn make your work much more reproducible.

What does pipeline do in Sklearn?

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a '__' , as in the example below.

What is pipeline in regression?

What are Regression pipelines? Regression pipelines allow you to predict the value of some numeric attribute for each of your users. If you know the value of that attribute for a subset of users, you can use a Regression pipeline to leverage that information into broad insights about your entire set of users.


2 Answers

Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear.

Update: in 0.21 you can use just square brackets:

pipeline['pca'] 

or indices

pipeline[1] 

There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave:

pipeline.named_steps['pca'] pipeline.steps[1][1] 

This will give you the PCA object, on which you can get components. With named_steps you can also use attribute access with a . which allows autocompletion:

pipeline.names_steps.pca.<tab here gives autocomplete>

like image 105
Andreas Mueller Avatar answered Sep 30 '22 19:09

Andreas Mueller


Using Neuraxle

Working with pipelines is simpler using Neuraxle. For instance, you can do this:

from neuraxle.pipeline import Pipeline  # Create and fit the pipeline:  pipeline = Pipeline([     StandardScaler(),     PCA(n_components=2) ]) pipeline, X_t = pipeline.fit_transform(X)  # Get the components:  pca = pipeline[-1] components = pca.components_ 

You can access your PCA these three different ways as wished:

  • pipeline['PCA']
  • pipeline[-1]
  • pipeline[1]

Neuraxle is a pipelining library built on top of scikit-learn to take pipelines to the next level. It allows easily managing spaces of hyperparameter distributions, nested pipelines, saving and reloading, REST API serving, and more. The whole thing is made to also use Deep Learning algorithms and to allow parallel computing.

Nested pipelines:

You could have pipelines within pipelines as below.

# Create and fit the pipeline:  pipeline = Pipeline([     StandardScaler(),     Identity(),     Pipeline([         Identity(),  # Note: an Identity step is a step that does nothing.          Identity(),  # We use it here for demonstration purposes.          Identity(),         Pipeline([             Identity(),             PCA(n_components=2)         ])     ]) ]) pipeline, X_t = pipeline.fit_transform(X) 

Then you'd need to do this:

# Get the components:  pca = pipeline["Pipeline"]["Pipeline"][-1] components = pca.components_ 
like image 33
Guillaume Chevalier Avatar answered Sep 30 '22 20:09

Guillaume Chevalier