I have a problem with fit_transform function. Can someone explain why size of array different?
In [5]: X.shape, test.shape
Out[5]: ((1000, 1932), (1000, 1932))
In [6]: from sklearn.feature_selection import VarianceThreshold
sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
features = sel.fit_transform(X)
features_test = sel.fit_transform(test)
In [7]: features.shape, features_test.shape
Out[7]:((1000, 1663), (1000, 1665))
UPD: Which transformation can help me get arrays with same sizes?
It is because you are fitting your selector twice.
First, note that fit_transform is just a call to fit followed by a call to transform.
The fit method allows your VarianceThreshold selector to find the features it wants to keep in the dataset based on the parameters you gave it.
The transform method performs the actual feature selection and returns a n array with just the selected features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With