Can GridSearchCV be used with a custom classifier?

Tags:

Ive created a custom hand-coded classifier which implements the standard sklearn classifier functions(fit(),predict() and predict_proba()). Can this be directly used with the sklearn utility GridSearchCV() or are there any additions that should be made?

EDIT 1 : On cel's suggestion I tried applying it directly

The first step was to add the get_params and set_params as explained here. Sure enough the complete cross validation procedure did run but ends up with the following error

return self._fit(X, y, ParameterGrid(self.param_grid))
best_estimator.fit(X, y, **self.fit_params)
AttributeError: 'NoneType' object has no attribute 'fit'

EDIT 2: Adding the classifier code(its a theano based Logistic Regression Classifier)

class LogisticRegression:
    """ Apply minibatch logistic regression

    :type n_in: int
    :param n_in: number of input units, the dimension of the space in
                 which the datapoints lie

    :type n_out: int
    :param n_out: number of output units, the dimension of the space in
                  which the labels lie

    """

    def __init__(self,n_in,n_out,batch_size=600,learning_rate=0.13,iters=500,verbose=0):
        self.n_in = n_in
        self.n_out = n_out
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.iters = iters
        self.verbose = verbose
        self.single_layer = Layer(self.n_in,self.n_out,T.nnet.softmax)
        self.minibatch_count = 0

    def get_params(self,deep=True):
        return {"n_in" : self.n_in,"n_out" : self.n_out,"batch_size" : self.batch_size,
                "learning_rate" : self.learning_rate,"iters" : self.iters,
                "verbose" : self.verbose}

    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)

    def minibatch_trainer(self,data_x,data_y):
        n_batches = data_x.get_value(borrow=True).shape[0]/self.batch_size
        tensor_x = T.matrix('x')
        tensor_y = T.ivector('y')
        index = T.lscalar('index')
        cost = self.single_layer.negative_log_likelihood(tensor_x, tensor_y)
        g_W = T.grad(cost,self.single_layer.W)
        g_b = T.grad(cost,self.single_layer.b)
        updates = [(self.single_layer.W,self.single_layer.W - g_W*self.learning_rate),
                    (self.single_layer.b,self.single_layer.b - g_b*self.learning_rate)]
        train_batch = theano.function([index],[cost],
                                      updates=updates,
                                      givens={tensor_x : data_x[index*self.batch_size : (index + 1)*self.batch_size],
                                              tensor_y : data_y[index*self.batch_size : (index + 1)*self.batch_size]})
        return np.mean([train_batch(i) for i in xrange(n_batches)])

    def fit(self,data_x,data_y):
        data_x,data_y = shared_dataset(data_x,data_y)
        start = time.clock()
        for iter in xrange(self.iters):
            train_err = self.minibatch_trainer(data_x,data_y)
            if self.verbose==1: print "Iter %d --> %f" % (iter,train_err)
        end = time.clock()
        print "Finished Training Logistic Regression Model\n" \
              "Iterations %d\n" \
              "Time Taken : %d secs" % (self.iters,end - start)
        return self

    def partial_fit(self,data_x,data_y):
        data_x,data_y = shared_dataset(data_x,data_y)
        self.minibatch_count += 1
        err = self.minibatch_trainer(data_x, data_y)
        print "MiniBatch %d --> %f" % (self.minibatch_count,err)

    def predict(self,data_x):
        data_x = shared_dataset(data_x)
        n_batches = data_x.get_value(borrow=True).shape[0]/self.batch_size
        tensor_x = T.matrix('x')
        index = T.lscalar('index')
        tensor_ypred = self.prediction_tensor(tensor_x)
        predictor = theano.function([index],tensor_ypred,
                                    givens={tensor_x : data_x[index*self.batch_size:(index + 1)*self.batch_size]})
        ypred = [predictor(i) for i in xrange(n_batches)]
        return np.hstack(ypred)

    def predict_proba(self,data_x):
        data_x = shared_dataset(data_x)
        tensor_x = T.matrix('x')
        tensor_ypredproba = self.single_layer.decision_function_tensor(tensor_x)
        predproba_func = theano.function([],tensor_ypredproba,
                                           givens={tensor_x : data_x})
        return predproba_func()

    def prediction_tensor(self,tensor_x):
        """
        Returns the predicted y value as a tensor variable
        :param tensor_x: TensorType matrix on input data
        :return: TensorType tensor_ypred output
        """
        return T.argmax(self.single_layer.decision_function_tensor(tensor_x),axis=1)

EDIT 3: Adding exact usage of GridSearchCV

clf_cv = GridSearchCV(LogisticRegression(n_in=200,n_out=2),{"iters" : [3]},cv=4,scoring="roc_auc",n_jobs=-1,verbose=1)

Ive also tried adding BaseEstimator and ClassifierMixin; sklearn.base.clone does not output any errors

348

asked Jan 24 '15 10:01

tangy

1 Answers

Had the same problems a couple of minutes ago. The documentation is incorrect. You have to change set_params to return self:

def set_params(self, **parameters):
  for parameter, value in parameters.items():
    setattr(self, parameter, value)
  return self

149

answered Oct 31 '22 18:10

memecs

Related questions
                            
                                Why set a bound method to python object create a circular reference?
                            
                                PBKDF2 Python keys vs .NET Rfc2898
                            
                                PyCharm autocomplete does not work with pygame
                            
                                How to install libxml2-dev libxslt-dev on Mac os
                            
                                Why do generator expressions and dict/set comprehensions in Python 2 use a nested function unlike list comprehensions?
                            
                                Psycopg2 query returning Decimal('value')
                            
                                python - list all inner functions of a function?
                            
                                ValueError: data type must provide an itemsize?
                            
                                PDF Viewer for Python Tkinter
                            
                                Does scipy.integrate.ode.set_solout work?
                            
                                Python regex findall works but match does not [duplicate]
                            
                                Combine elements of lists if some condition
                            
                                Is it possible to nest the all function?
                            
                                Tornado redirecting to page with parameters
                            
                                Python Virtualenv Check Environment
                            
                                Why is a Python multiprocessing daemon process not printing to standard output?
                            
                                Getting http header with python (getting 405)
                            
                                what is an equivalent of matlab permute(A, [3 2 1]) in python?
                            
                                Pandas scatter_matrix - plot categorical variables
                            
                                Code coverage for jinja2 templates in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can GridSearchCV be used with a custom classifier?

Tags:

python

scikit-learn

tangy

People also ask

1 Answers

memecs

Recent Activity

Donate For Us