I'm trying to implement my own Imputer. Under certain conditions, I would like to filter some of the train samples (that I deem low quality).
However, since the transform
method returns only X
and not y
, and y
itself is a numpy array (which I can't filter in place to the best of my knowledge), and moreover - when I use GridSearchCV
- the y
my transform
method receives is None
, I can't seem to find a way to do it.
Just to clarify: I'm perfectly clear on how to filter arrays. I can't find a way to fit sample filtering on the y
vector into the current API.
I really want to do that from a BaseEstimator
implementation so that I could use it with GridSearchCV
(it has a few parameters). Am I missing a different way to achieve sample filtration (not through BaseEstimator
, but GridSearchCV
compliant)? is there some way around the current API?
The scikit-learn transformer API is made for changing the features of the data (in nature and possibly in number/dimension), but not for changing the number of samples. Any transformer that drops or adds samples is, as of the existing versions of scikit-learn, not compliant with the API (possibly a future addition if deemed important).
So in view of this it looks like you will have to work your way around standard scikit-learn API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With