I got this from the sklearn webpage: <ul> <li> Pipeline: Pipeline of transforms with a final estimator </li> <li> Make_pipeline: Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor. </li> </ul> But I still do not understand when I have to use each one. Can anyone give me an example?

The only difference is that <code>make_pipeline</code> generates names for steps automatically. Step names are needed e.g. if you want to use a pipeline with model selection utilities (e.g. GridSearchCV). With grid search you need to specify parameters for various steps of a pipeline: <pre class="prettyprint"><code>pipe = Pipeline([('vec', CountVectorizer()), ('clf', LogisticRegression()]) param_grid = [{'clf__C': [1, 10, 100, 1000]} gs = GridSearchCV(pipe, param_grid) gs.fit(X, y) </code></pre> compare it with make_pipeline: <pre class="prettyprint"><code>pipe = make_pipeline(CountVectorizer(), LogisticRegression()) param_grid = [{'logisticregression__C': [1, 10, 100, 1000]} gs = GridSearchCV(pipe, param_grid) gs.fit(X, y) </code></pre> So, with <code>Pipeline</code>: <ul> <li>names are explicit, you don't have to figure them out if you need them;</li> <li>name doesn't change if you change estimator/transformer used in a step, e.g. if you replace LogisticRegression() with LinearSVC() you can still use <code>clf__C</code>.</li> </ul> <code>make_pipeline</code>: <ul> <li>shorter and arguably more readable notation;</li> <li>names are auto-generated using a straightforward rule (lowercase name of an estimator).</li> </ul> When to use them is up to you :) I prefer make_pipeline for quick experiments and Pipeline for more stable code; a rule of thumb: IPython Notebook -> make_pipeline; Python module in a larger project -> Pipeline. But it is certainly not a big deal to use make_pipeline in a module or Pipeline in a short script or a notebook.

What is the difference between pipeline and make_pipeline in scikit?

1 Answers

The only difference is that make_pipeline generates names for steps automatically.

Step names are needed e.g. if you want to use a pipeline with model selection utilities (e.g. GridSearchCV). With grid search you need to specify parameters for various steps of a pipeline:

pipe = Pipeline([('vec', CountVectorizer()), ('clf', LogisticRegression()]) param_grid = [{'clf__C': [1, 10, 100, 1000]} gs = GridSearchCV(pipe, param_grid) gs.fit(X, y)

compare it with make_pipeline:

pipe = make_pipeline(CountVectorizer(), LogisticRegression())      param_grid = [{'logisticregression__C': [1, 10, 100, 1000]} gs = GridSearchCV(pipe, param_grid) gs.fit(X, y)

So, with Pipeline:

names are explicit, you don't have to figure them out if you need them;
name doesn't change if you change estimator/transformer used in a step, e.g. if you replace LogisticRegression() with LinearSVC() you can still use clf__C.

make_pipeline:

shorter and arguably more readable notation;
names are auto-generated using a straightforward rule (lowercase name of an estimator).

When to use them is up to you :) I prefer make_pipeline for quick experiments and Pipeline for more stable code; a rule of thumb: IPython Notebook -> make_pipeline; Python module in a larger project -> Pipeline. But it is certainly not a big deal to use make_pipeline in a module or Pipeline in a short script or a notebook.

163

answered Oct 08 '22 10:10

Mikhail Korobov

Related questions
                            
                                SQLAlchemy multiple foreign keys in one mapped class to the same primary key
                            
                                When using asyncio, how do you allow all running tasks to finish before shutting down the event loop
                            
                                What is the difference between TypeVar and NewType?
                            
                                Python Array Slice With Comma?
                            
                                Trying to get PyCharm to work, keep getting "No Python interpreter selected"
                            
                                Convert Pandas dataframe to PyTorch tensor?
                            
                                Equivalent function for xticks for an AxesSubplot object
                            
                                numpy subtract every row of matrix by vector
                            
                                From stat().st_mtime to datetime?
                            
                                Run a .bat file using python code
                            
                                What is the difference between the widgets of tkinter and tkinter.ttk in Python?
                            
                                Tracking *maximum* memory usage by a Python function
                            
                                How do I log an exception at warning- or info-level with traceback using the python logging framework?
                            
                                How to continue in nested loops in Python
                            
                                Passing Numpy arrays to a C function for input and output
                            
                                Easy way to test if each element in an numpy array lies between two values?
                            
                                Convert timestamps with offset to datetime obj using strptime
                            
                                Do you use the "global" statement in Python? [closed]
                            
                                What do I do when I need a self referential dictionary?
                            
                                Comparing boolean and int using isinstance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between pipeline and make_pipeline in scikit?

Tags:

python

scikit-learn

Aizzaac

People also ask

1 Answers

Mikhail Korobov

Recent Activity

Donate For Us