fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

Tags:

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

Image shows my sample Data(After all Cleaning) Screenshot : Sample Data

I have this two Pipline built to Clean the Data:

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

print(type(num_attribs))

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ('imputer', Imputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer())
])

Then I did the union of this two pipelines and the code for the same is shown below :

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])

Now I am trying to do fit_transform on the Data But Its showing Me the Error.

Code for Transformation:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Error message:

fit_transform() takes 2 positional arguments but 3 were given

952

asked Sep 11 '17 19:09

3 Answers

The Problem:

The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:

def fit_transform(self, x, y)
    ...rest of the code

while it is defined to take only two:

def fit_transform(self, x):
    ...rest of the code

Possible Solution:

This can be solved by making a custom transformer that can handle 3 positional arguments:

Import and make a new class:

from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
    def __init__(self, *args, **kwargs):
        self.encoder = LabelBinarizer(*args, **kwargs)
    def fit(self, x, y=0):
        self.encoder.fit(x)
        return self
    def transform(self, x, y=0):
        return self.encoder.transform(x)

Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().

Note: If you want access to LabelBinarizer Attributes (e.g. classes_), add the following line to the fit method:

    self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_

186

answered Oct 16 '22 15:10

I believe your example is from the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. Unfortunately, I ran into this problem, as well. A recent change in scikit-learn (0.19.0) changed LabelBinarizer's fit_transform method. Unfortunately, LabelBinarizer was never intended to work how that example uses it. You can see information about the change here and here.

Until they come up with a solution for this, you can install the previous version (0.18.0) as follows:

$ pip install scikit-learn==0.18.0

After running that, your code should run without issue.

In the future, it looks like the correct solution may be to use a CategoricalEncoder class or something similar to that. They have been trying to solve this problem for years apparently. You can see the new class here and further discussion of the problem here.

answered Oct 16 '22 14:10

Steven Oxley

I think you are going through the examples from the book: Hands on Machine Learning with Scikit Learn and Tensorflow. I ran into the same problem when going through the example in Chapter 2.

As mentioned by other people, the problem is to do with sklearn's LabelBinarizer. It takes less args in its fit_transform method compared to other transformers in the pipeline. (only y when other transformers normally take both X and y, see here for details). That's why when we run pipeline.fit_transform, we fed more args into this transformer than required.

An easy fix I used is to just use OneHotEncoder and set the "sparse" to False to ensure the output is a numpy array same as the num_pipeline output. (this way you don't need to code up your own custom encoder)

your original cat_pipeline:

cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])

you can simply change this part to:

cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('one_hot_encoder', OneHotEncoder(sparse=False))
])

You can go from here and everything should work.

answered Oct 16 '22 16:10

Norman Yan

Related questions
                            
                                Return list of items in list greater than some value
                            
                                Matplotlib color according to class labels
                            
                                Tkinter understanding mainloop
                            
                                Pythonic way to convert a dictionary into namedtuple or another hashable dict-like?
                            
                                How can I unit test django messages?
                            
                                add vs update in set operations in python
                            
                                How can I run a celery periodic task from the shell manually?
                            
                                RuntimeError: module compiled against API version a but this version of numpy is 9
                            
                                Python unexpected EOF while parsing
                            
                                Compile main Python program using Cython
                            
                                A get() like method for checking for Python attributes
                            
                                Python: return the index of the first element of a list which makes a passed function true
                            
                                Which Model Field to use in Django to store longitude and latitude values?
                            
                                Calling Python in PHP
                            
                                Is it possible to store the alembic connect string outside of alembic.ini?
                            
                                ModuleNotFoundError: No module named 'sklearn'
                            
                                How do I set up Setuptools for Python 2.6 on Windows?
                            
                                How to find overlapping matches with a regexp?
                            
                                How should I understand the output of dis.dis?
                            
                                How to assign a value to a TensorFlow variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

Tags:

python

scikit-learn

data-science

Viral Parmar

People also ask

3 Answers

Zaid E.

Steven Oxley

Norman Yan

Recent Activity

Donate For Us