Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python scikit-learn - TypeError

I'm writing a little program to plot the learning curves of SVM and Naive Bayes for a dataset with cross-validation.This is the code of the plotting function

import numpy as np
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.learning_curve import learning_curve

def plot_learning_curves(X, y, nb=GaussianNB, svc=SVC(kernel='linear'), ylim=None, cv=None, n_jobs=1,
                     train_sizes=np.linspace(.1, 1.0, 5)):
    plt.figure()
    plt.title('Learning Curves with NB and SVM')
    if ylim is not None:
        plt.ylim(*ylim)

    train_sizes_nb, test_scores_nb = learning_curve(
        nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    test_scores_mean_nb = np.mean(test_scores_nb, axis=1)

    train_sizes_svc, test_scores_svc = learning_curve(
        svc, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    test_scores_mean_svc = np.mean(test_scores_svc, axis=1)

    plt.grind()

    plt.plot(train_sizes_nb, test_scores_mean_nb, 'o-', color="g",
             label="NB")
    plt.plot(train_sizes_svc, test_scores_mean_svc,'o',color="r",label="SVM")    

return plt

And this is the function call:

digits = load_digits()
X, y = digits.data, digits.target

cv = cross_validation.ShuffleSplit(digits.data.shape[0], n_iter=100,
                               test_size=0.2, random_state=0)
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
plt.show()

I don't know what's the problem but i get this error :

Traceback (most recent call last):
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 43, in <module>
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 19, in plot_learning_curves
nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in learning_curve
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 652, in __call__
for function, args, kwargs in iterable:
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in <genexpr>
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\base.py", line 45, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)

Process finished with exit code 1

I don't understand what the line "TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)" means.

What would be a possible solution?

like image 353
Gianmarco Biscini Avatar asked Nov 29 '22 23:11

Gianmarco Biscini


2 Answers

The solution was pretty easy. It's not

nb=GaussianNB

but

nb=GaussianNB()
like image 81
Gianmarco Biscini Avatar answered Dec 05 '22 04:12

Gianmarco Biscini


TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)

This error means that the method get_params() have received None instead of a GaussianNB object.

The error happens several steps into the internals of the sklearn module. So it's hard to debug the exact cause withouth stepping into the code using a debugging tool and reading the sklearn source code.

If you are using ipython, the %debug magic command is very useful for investigating these kinds of exceptions.

Looking at your code it looks like the problem might be that you are passing the class GaussianNB instead of an instance of that class to sklearn.learning_curve.learning_curve()

From to the docs learning_curve

Parameters: estimator : object type that implements the “fit” and “predict” methods An object of that type which is cloned for each validation.

I find this ambigouous. But in the example code, a GaussianNB instance is used, not a type.

In addition to this, using mutables as default arguments is usually not a good idea. Object instances are mutable. It also makes your code harder to read and debug.

With this many optional key word arguments, something like this could be more readable.

def plot_learning_curves(x, y, ylim=None, **kwargs):
    """ Plots learning curves with NB and SVM """
    nb = kwargs.get('nb', GaussianNB())
    svc = kwargs.get('svc', SVC(kernel='linear'))
    train_sizes = kwargs.get('train_sizes', np.linspace(.1, 1.0, 5))     

You might not need those key word arguments at all. It looks like you started out by copying some example code and adding your own stuff. It's better to simplify the example code first and make sure you understand what's happening.

def plot_learning_curves(x, y, ylim=None):
    nb = GaussianNB()
    svc = SVC(kernel='linear')
    train_sizes = np.linspace(.1, 1.0, 5)
like image 44
Håken Lid Avatar answered Dec 05 '22 02:12

Håken Lid