Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: get_params() missing 1 required positional argument: 'self'

Tags:

I was trying to use scikit-learn package with python-3.4 to do a grid search,

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression)
])

parameters = {
    'vect__max_df': (0.25, 0.5, 0.75),
    'vect__stop_words': ('english', None),
    'vect__max_features': (2500, 5000, 10000, None),
    'vect__ngram_range': ((1, 1), (1, 2)),
    'vect__use_idf': (True, False),
    'vect__norm': ('l1', 'l2'),
    'clf__penalty': ('l1', 'l2'),
    'clf__C': (0.01, 0.1, 1, 10)
}

if __name__ == '__main__':
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
    df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None)
    lb = LabelBinarizer()
    X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    grid_search.fit(X_train, y_train)
    print('Best score: ', grid_search.best_score_)
    print('Best parameter set:')
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(best_parameters):
        print(param_name, best_parameters[param_name])

However, it does not run successfully, the error message looks like this:

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

I also tried to use only

if __name__ == '__main__':
    pipeline.get_params()

It gives the same error message. Who knows how to fix this?

like image 687
Xiangru Lian Avatar asked May 04 '15 09:05

Xiangru Lian


People also ask

How do you fix a missing one required positional argument?

The Python "TypeError: __init__() missing 1 required positional argument" occurs when we forget to provide a required argument when instantiating a class. To solve the error, specify the argument when instantiating the class or set a default value for the argument.

How do you fix missing positional arguments in python?

The Python "TypeError: missing 2 required positional arguments" occurs when we forget to provide 2 required arguments when calling a function or method. To solve the error, specify the arguments when calling the function or set default values for the arguments.


2 Answers

This error is almost always misleading, and actually means that you're calling an instance method on the class, rather than the instance (like calling dict.keys() instead of d.keys() on a dict named d).*

And that's exactly what's going on here. The docs imply that the best_estimator_ attribute, like the estimator parameter to the initializer, is not an estimator instance, it's an estimator type, and "A object of that type is instantiated for each grid point."

So, if you want to call methods, you have to construct an object of that type, for some particular grid point.

However, from a quick glance at the docs, if you're trying to get the params that were used for the particular instance of the best estimator that returned the best score, isn't that just going to be best_params_? (I apologize that this part is a bit of a guess…)


For the Pipeline call, you definitely have an instance there. And the only documentation for that method is a param spec which shows that it takes one optional argument, deep. But under the covers, it's probably forwarding the get_params() call to one of its attributes. And with ('clf', LogisticRegression), it looks like you're constructing it with the class LogisticRegression, rather than an instance of that class, so if that's what it ends up forwarding to, that would explain the problem.


* The reason the error says "missing 1 required positional argument: 'self'" instead of "must be called on an instance" or something is that in Python, d.keys() is effectively turned into dict.keys(d), and it's perfectly legal (and sometimes useful) to call it that way explicitly, so Python can't really tell you that dict.keys() is illegal, just that it's missing the self argument.

like image 65
abarnert Avatar answered Sep 19 '22 19:09

abarnert


I finally get the problem solved. The reason is exactly as what abarnert said.

Firstly I tried:

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}

and it works well.

With that intuition I modified the pipeline to be:

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])

Note that there is a () after LogisticRegression. This time it works.

like image 29
Xiangru Lian Avatar answered Sep 21 '22 19:09

Xiangru Lian