I was trying to use <code>scikit-learn</code> package with python-3.4 to do a grid search, <pre class="prettyprint"><code>from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model.logistic import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.grid_search import GridSearchCV import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.metrics import precision_score, recall_score, accuracy_score from sklearn.preprocessing import LabelBinarizer import numpy as np pipeline = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', LogisticRegression) ]) parameters = { 'vect__max_df': (0.25, 0.5, 0.75), 'vect__stop_words': ('english', None), 'vect__max_features': (2500, 5000, 10000, None), 'vect__ngram_range': ((1, 1), (1, 2)), 'vect__use_idf': (True, False), 'vect__norm': ('l1', 'l2'), 'clf__penalty': ('l1', 'l2'), 'clf__C': (0.01, 0.1, 1, 10) } if __name__ == '__main__': grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3) df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None) lb = LabelBinarizer() X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])]) X_train, X_test, y_train, y_test = train_test_split(X, y) grid_search.fit(X_train, y_train) print('Best score: ', grid_search.best_score_) print('Best parameter set:') best_parameters = grid_search.best_estimator_.get_params() for param_name in sorted(best_parameters): print(param_name, best_parameters[param_name]) </code></pre> However, it does not run successfully, the error message looks like this: <pre class="prettyprint"><code>Fitting 3 folds for each of 1536 candidates, totalling 4608 fits Traceback (most recent call last): File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module> grid_search.fit(X_train, y_train) File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit return self._fit(X, y, ParameterGrid(self.param_grid)) File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit base_estimator = clone(self.estimator) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone new_object_params[name] = clone(param, safe=False) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone return estimator_type([clone(e, safe=safe) for e in estimator]) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp> return estimator_type([clone(e, safe=safe) for e in estimator]) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone return estimator_type([clone(e, safe=safe) for e in estimator]) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp> return estimator_type([clone(e, safe=safe) for e in estimator]) File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone new_object_params = estimator.get_params(deep=False) TypeError: get_params() missing 1 required positional argument: 'self' </code></pre> I also tried to use only <pre class="prettyprint"><code>if __name__ == '__main__': pipeline.get_params() </code></pre> It gives the same error message. Who knows how to fix this?

This error is almost always misleading, and actually means that you're calling an instance method on the class, rather than the instance (like calling <code>dict.keys()</code> instead of <code>d.keys()</code> on a <code>dict</code> named <code>d</code>).* And that's exactly what's going on here. The docs imply that the <code>best_estimator_</code> attribute, like the <code>estimator</code> parameter to the initializer, is not an estimator instance, it's an estimator type, and "A object of that type is instantiated for each grid point." So, if you want to call methods, you have to construct an object of that type, for some particular grid point. However, from a quick glance at the docs, if you're trying to get the params that were used for the particular instance of the best estimator that returned the best score, isn't that just going to be <code>best_params_</code>? (I apologize that this part is a bit of a guess…) <hr> For the <code>Pipeline</code> call, you definitely have an instance there. And the only documentation for that method is a param spec which shows that it takes one optional argument, <code>deep</code>. But under the covers, it's probably forwarding the <code>get_params()</code> call to one of its attributes. And with <code>('clf', LogisticRegression)</code>, it looks like you're constructing it with the class <code>LogisticRegression</code>, rather than an instance of that class, so if that's what it ends up forwarding to, that would explain the problem. <hr> * The reason the error says "missing 1 required positional argument: 'self'" instead of "must be called on an instance" or something is that in Python, <code>d.keys()</code> is effectively turned into <code>dict.keys(d)</code>, and it's perfectly legal (and sometimes useful) to call it that way explicitly, so Python can't really tell you that <code>dict.keys()</code> is illegal, just that it's missing the <code>self</code> argument.

I finally get the problem solved. The reason is exactly as what abarnert said. Firstly I tried: <pre class="prettyprint"><code>pipeline = LogisticRegression() parameters = { 'penalty': ('l1', 'l2'), 'C': (0.01, 0.1, 1, 10) } </code></pre> and it works well. With that intuition I modified the pipeline to be: <pre class="prettyprint"><code>pipeline = Pipeline([ ('vect', TfidfVectorizer(stop_words='english')), ('clf', LogisticRegression()) ]) </code></pre> Note that there is a <code>()</code> after <code>LogisticRegression</code>. This time it works.

TypeError: get_params() missing 1 required positional argument: 'self'

Tags:

I was trying to use scikit-learn package with python-3.4 to do a grid search,

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression)
])

parameters = {
    'vect__max_df': (0.25, 0.5, 0.75),
    'vect__stop_words': ('english', None),
    'vect__max_features': (2500, 5000, 10000, None),
    'vect__ngram_range': ((1, 1), (1, 2)),
    'vect__use_idf': (True, False),
    'vect__norm': ('l1', 'l2'),
    'clf__penalty': ('l1', 'l2'),
    'clf__C': (0.01, 0.1, 1, 10)
}

if __name__ == '__main__':
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
    df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None)
    lb = LabelBinarizer()
    X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    grid_search.fit(X_train, y_train)
    print('Best score: ', grid_search.best_score_)
    print('Best parameter set:')
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(best_parameters):
        print(param_name, best_parameters[param_name])

However, it does not run successfully, the error message looks like this:

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

I also tried to use only

if __name__ == '__main__':
    pipeline.get_params()

It gives the same error message. Who knows how to fix this?

687

asked May 04 '15 09:05

Xiangru Lian

2 Answers

This error is almost always misleading, and actually means that you're calling an instance method on the class, rather than the instance (like calling dict.keys() instead of d.keys() on a dict named d).^*

And that's exactly what's going on here. The docs imply that the best_estimator_ attribute, like the estimator parameter to the initializer, is not an estimator instance, it's an estimator type, and "A object of that type is instantiated for each grid point."

So, if you want to call methods, you have to construct an object of that type, for some particular grid point.

However, from a quick glance at the docs, if you're trying to get the params that were used for the particular instance of the best estimator that returned the best score, isn't that just going to be best_params_? (I apologize that this part is a bit of a guess…)

For the Pipeline call, you definitely have an instance there. And the only documentation for that method is a param spec which shows that it takes one optional argument, deep. But under the covers, it's probably forwarding the get_params() call to one of its attributes. And with ('clf', LogisticRegression), it looks like you're constructing it with the class LogisticRegression, rather than an instance of that class, so if that's what it ends up forwarding to, that would explain the problem.

_{* The reason the error says "missing 1 required positional argument: 'self'" instead of "must be called on an instance" or something is that in Python, d.keys() is effectively turned into dict.keys(d), and it's perfectly legal (and sometimes useful) to call it that way explicitly, so Python can't really tell you that dict.keys() is illegal, just that it's missing the self argument.}

answered Sep 19 '22 19:09

abarnert

I finally get the problem solved. The reason is exactly as what abarnert said.

Firstly I tried:

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}

and it works well.

With that intuition I modified the pipeline to be:

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])

Note that there is a () after LogisticRegression. This time it works.

answered Sep 21 '22 19:09

Xiangru Lian

Related questions
                            
                                iOS 8: Launch Screen StoryBoard appears black [single XIB file works fine]
                            
                                JavaScript - Converting a Date() into seconds [duplicate]
                            
                                What's the difference between cachePrepStmts and useServerPrepStmts in MySQL JDBC Driver
                            
                                Changing current cmake generator
                            
                                Display image of graph in TensorFlow?
                            
                                Xcode Lost connection to iPhone
                            
                                Set Minimum Password Length Firebase Email & Password Authentication
                            
                                Intellij doesn't show run button
                            
                                What does ":+" mean in Scala
                            
                                Is React Native's LayoutAnimation supported on Android?
                            
                                Brew install qt does not work on macOS Sierra
                            
                                You have to be inside an angular-cli project in order to use the serve command

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

TypeError: get_params() missing 1 required positional argument: 'self'

Tags:

Xiangru Lian

People also ask

2 Answers

abarnert

Xiangru Lian

Recent Activity

Donate For Us