Scikit-Learn GridSearch custom scoring function

Tags:

scikit-learn

I need to perform kernel pca on a dataset of dimension (5000, 26421) to get a lower dimension representation. To choose the number of components (say k) parameter, I am performing the reduction of the data and reconstruction to the original space and getting the mean square error of the reconstructed and original data for different values of k.

I came across sklearn's gridsearch functionality and want to use it for the above parameter estimation. Since there is no score function for kernel pca, I have implemented a custom scoring function and passing it to Gridsearch.

from sklearn.decomposition.kernel_pca import KernelPCA
from sklearn.model_selection import GridSearchCV
import numpy as np
import math

def scorer(clf, X):
    Y1 = clf.inverse_transform(X)
    error = math.sqrt(np.mean((X - Y1)**2))
    return error

param_grid = [
    {'degree': [1, 10], 'kernel': ['poly'], 'n_components': [100, 400, 100]},
    {'gamma': [0.001, 0.0001], 'kernel': ['rbf'], 'n_components': [100, 400, 100]},
]

kpca = KernelPCA(fit_inverse_transform=True, n_jobs=30)
clf = GridSearchCV(estimator=kpca, param_grid=param_grid, scoring=scorer)
clf.fit(X)

However, it results in the below error:

/usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X=array([[ 2.,  2.,  1., ...,  0.,  0.,  0.],
    ....,  0.,  1., ...,  0.,  0.,  0.]], dtype=float32), Y=array([[-0.05904257, -0.02796719,  0.00919842, ....        0.00148251, -0.00311711]], dtype=float32), precomp
uted=False, dtype=<type 'numpy.float32'>)
    117                              "for %d indexed." %
    118                              (X.shape[0], X.shape[1], Y.shape[0]))
    119     elif X.shape[1] != Y.shape[1]:
    120         raise ValueError("Incompatible dimension for X and Y matrices: "
    121                          "X.shape[1] == %d while Y.shape[1] == %d" % (
--> 122                              X.shape[1], Y.shape[1]))
        X.shape = (1667, 26421)
        Y.shape = (112, 100)
    123 
    124     return X, Y
    125 
    126 

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 26421 while Y.shape[1] == 100

Can someone point out what exactly am I doing wrong?

303

asked Sep 13 '17 23:09

user1683894

1 Answers

The syntax of scoring function is incorrect. You only need to pass the predicted and truth values for the classifiers. So this is how you declare your custom scoring function :

def my_scorer(y_true, y_predicted):
    error = math.sqrt(np.mean((y_true - y_predicted)**2))
    return error

Then you can use make_scorer function in Sklearn to pass it to the GridSearch.Be sure to set the greater_is_better attribute accordingly:

Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.

I am assuming you are calculating an error, so this attribute should set as False, since lesser the error, the better:

from sklearn.metrics import make_scorer
my_func = make_scorer(my_scorer, greater_is_better=False)

Then you pass it to the GridSearch :

GridSearchCV(estimator=my_clf, param_grid=param_grid, scoring=my_func)

Where my_clf is your classifier.

One more thing, I don't think GridSearchCV is exactly what you are looking for. It basically accepts data in the form of train and test splits. But here you only want to transform your input data. You need to use Pipeline in Sklearn. Look at the example mentioned here of combining PCA and GridSearchCV.

195

answered Sep 27 '22 22:09

Gambit1614

Related questions
                            
                                How to export a linear regression formula out of sklearn LinearRegression
                            
                                scikit-learn: don't separate hyphenated words while tokenization
                            
                                ImportError: cannot import name '_safe_split'
                            
                                Different coefficients: scikit-learn vs statsmodels (logistic regression)
                            
                                Kernel in a logistic regression model LogisticRegression scikit-learn sklearn
                            
                                XGBoost: AttributeError: 'DataFrame' object has no attribute 'feature_names'
                            
                                Scikit-learn pipeline TypeError: zip argument #2 must support iteration
                            
                                Cross Validating With Imblearn Pipeline And GridSearchCV
                            
                                Use scikit-learn TfIdf with gensim LDA
                            
                                How does LassoCV in scikit-learn partition data?
                            
                                Can GridSearchCV be used with a custom classifier?
                            
                                Remove Outliers from dataset
                            
                                Issue with scipy install on windows
                            
                                tensorflow: saving and restoring session
                            
                                Shortest Syntax To Use numpy 1d-array As sklearn X
                            
                                How to classify new documents with tf-idf?
                            
                                Random search without cross validation in python/sklearn
                            
                                How to calculate Cohen's kappa coefficient that measures inter-rater agreement ? ( movie review )
                            
                                What is the meaning of 'mean_test_score' in cv_result?
                            
                                scikit-learn refit/partial fit option in Classifers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With