I am totally new to Machine Learning and I have been working with unsupervised learning technique.
Image shows my sample Data(After all Cleaning) Screenshot : Sample Data
I have this two Pipline built to Clean the Data:
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
print(type(num_attribs))
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])
Then I did the union of this two pipelines and the code for the same is shown below :
from sklearn.pipeline import FeatureUnion
full_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
Now I am trying to do fit_transform on the Data But Its showing Me the Error.
Code for Transformation:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
Error message:
fit_transform() takes 2 positional arguments but 3 were given
fit_transform() – It is used on the training data so that we can scale the training data and also learn the scaling parameters. Here, the model built will learn the mean and variance of the features of the training set. These learned parameters are then further used to scale our test data.
The fit() function calculates the values of these parameters. The transform function applies the values of the parameters on the actual data and gives the normalized value. The fit_transform() function performs both in the same step. Note that the same value is got whether we perform in 2 steps or in a single step.
fit_transform(): This method performs fit and transform on the input data at a single time and converts the data points. If we use fit and transform separate when we need both then it will decrease the efficiency of the model so we use fit_transform() which will do both the work.
The pipeline is assuming LabelBinarizer's fit_transformmethod is defined to take three positional arguments: def fit_transform(self, x, y) ...rest of the code
The pipeline is assuming LabelBinarizer's fit_transformmethod is defined to take three positional arguments: def fit_transform(self, x, y) ...rest of the code while it is defined to take only two: def fit_transform(self, x): ...rest of the code
python - fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer - Stack Overflow I am totally new to Machine Learning and I have been working with unsupervised learning technique. Image shows my sample Data(After all Cleaning) Screenshot : Sample Data I have this two Pipline bu... Stack Overflow About
2 5 LabelBinarizer is not supposed to be used with X (Features), but is intended for labels only. Hence the fit and fit_transform methods are changed to include only single object y. But the Pipeline (which works on features) will try sending both X and y to it. Hence the error.
The Problem:
The pipeline is assuming LabelBinarizer's fit_transform
method is defined to take three positional arguments:
def fit_transform(self, x, y)
...rest of the code
while it is defined to take only two:
def fit_transform(self, x):
...rest of the code
Possible Solution:
This can be solved by making a custom transformer that can handle 3 positional arguments:
Import and make a new class:
from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
def __init__(self, *args, **kwargs):
self.encoder = LabelBinarizer(*args, **kwargs)
def fit(self, x, y=0):
self.encoder.fit(x)
return self
def transform(self, x, y=0):
return self.encoder.transform(x)
Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().
fit
method:
self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_
I believe your example is from the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. Unfortunately, I ran into this problem, as well. A recent change in scikit-learn
(0.19.0
) changed LabelBinarizer
's fit_transform
method. Unfortunately, LabelBinarizer
was never intended to work how that example uses it. You can see information about the change here and here.
Until they come up with a solution for this, you can install the previous version (0.18.0
) as follows:
$ pip install scikit-learn==0.18.0
After running that, your code should run without issue.
In the future, it looks like the correct solution may be to use a CategoricalEncoder
class or something similar to that. They have been trying to solve this problem for years apparently. You can see the new class here and further discussion of the problem here.
I think you are going through the examples from the book: Hands on Machine Learning with Scikit Learn and Tensorflow. I ran into the same problem when going through the example in Chapter 2.
As mentioned by other people, the problem is to do with sklearn's LabelBinarizer. It takes less args in its fit_transform method compared to other transformers in the pipeline. (only y when other transformers normally take both X and y, see here for details). That's why when we run pipeline.fit_transform, we fed more args into this transformer than required.
An easy fix I used is to just use OneHotEncoder and set the "sparse" to False to ensure the output is a numpy array same as the num_pipeline output. (this way you don't need to code up your own custom encoder)
your original cat_pipeline:
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])
you can simply change this part to:
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('one_hot_encoder', OneHotEncoder(sparse=False))
])
You can go from here and everything should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With