Ridge Regression Grid Search with Pipeline

Tags:

I am trying to optimize hyperparameters for ridge regression. But also add polynomial features. So, pipeline looks okay but getting error when try to gridsearchcv. Here:

# Importing the Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from collections import Counter
from IPython.core.display import display, HTML
sns.set_style('darkgrid')

# Data Preprocessing 
from sklearn.datasets import load_boston
boston_dataset = load_boston()
dataset = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)
dataset['MEDV'] = boston_dataset.target

# X and y Variables
X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values.reshape(-1,1)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 25)

# Building the Model ------------------------------------------------------------------------

# Fitting regressior to the Training set
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge())
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)
# Predicting the Test set results
y_pred = ridge_pipe.predict(X_test)

# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = ridge_pipe, X = X_train, y = y_train, cv = 10)
accuracies.mean()
#accuracies.std()

# Applying Grid Search to find the best model and the best parameters
from sklearn.model_selection import GridSearchCV

parameters = [ {'alpha': np.arange(0, 0.2, 0.01) } ]

grid_search = GridSearchCV(estimator = ridge_pipe, 
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1)
grid_search = grid_search.fit(X_train, y_train)  # <-- GETTING ERROR IN HERE

Error:

ValueError: Invalid parameter ridge for estimator

What to do or, is there a better way to use ridge regression with pipeline? I would be pleased if put some sources about gridsearch because I am a newbie on this. The error:

349

asked Aug 06 '19 13:08

cepel

1 Answers

There are two problems in your code. First since you are using a pipeline, you need to specify in the params list which part of the pipeline does the params belongs to. See the official documentation for more information :

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below

In this case, since alpha is going to be used with ridge-regression and you have used the string model in the Pipeline defintion, you need to rename the key alpha to model_alpha:

steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge())  # <------ Whatever string you assign here will be used later
]

# Since you have named it as 'model', you need change it to 'model_alpha'
parameters = [ {'model__alpha': np.arange(0, 0.2, 0.01) } ]

Next, you need to understand this dataset is for Regression. You should not use accuracy here, instead use a regression based scoring function like, mean_squared_error. Here are some other metrics for regression that you can use. Something like this

from sklearn.metrics import mean_squared_error, make_scorer
scoring_func = make_scorer(mean_squared_error)

grid_search = GridSearchCV(estimator = ridge_pipe, 
                           param_grid = parameters,
                           scoring = scoring_func,  #<--- Use the scoring func defined above
                           cv = 10,
                           n_jobs = -1)

Here is a link to a Google colab notebook with working code.

answered Sep 23 '22 03:09

Gambit1614

Related questions
                            
                                How to create user in amazon-cognito using boto3 in python
                            
                                Can't save data from yfinance into a CSV file
                            
                                PANDAS: int32 overflow? Can't bulid a pivot table
                            
                                How to make Keras compute a certain metric on validation data only?
                            
                                No batch_size while making inference with BERT model
                            
                                Put the legend of pandas bar plot with secondary y axis in front of bars
                            
                                Why does zip return tuples?
                            
                                Airflow: Re-run DAG from beginning with new schedule
                            
                                Keras: What is the difference between model and layers?
                            
                                How to install plaidML / plaidML-keras
                            
                                How to add a proper 'meta.yaml' recipe file for creating a conda-forge package distribution? Particularly `test` section in recipe file?
                            
                                Install npm package with conda via environment.yml
                            
                                dictionary keys to replace strings in pandas dataframe column with dictionary values and perform evaluate
                            
                                Fastest way to search a list of named tuples?
                            
                                How to use a different C++ compiler in Cython?
                            
                                How to efficiently get cell values from multiple DataFrames to insert into a master DataFrame
                            
                                Implementation of the latest "Lookahead Optimizer" paper in Keras? [closed]
                            
                                TypeError: join() argument must be str or bytes, not 'list'
                            
                                How to cast int32 tensor to float32
                            
                                Apply python lazy map for in-place functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ridge Regression Grid Search with Pipeline

Tags:

python

machine-learning

regression

grid-search

cepel

People also ask

1 Answers

Gambit1614

Recent Activity

Donate For Us