sklearn: Have an estimator that filters samples

Question

I'm trying to implement my own Imputer. Under certain conditions, I would like to filter some of the train samples (that I deem low quality).

However, since the transform method returns only X and not y, and y itself is a numpy array (which I can't filter in place to the best of my knowledge), and moreover - when I use GridSearchCV- the y my transform method receives is None, I can't seem to find a way to do it.

Just to clarify: I'm perfectly clear on how to filter arrays. I can't find a way to fit sample filtering on the y vector into the current API.

I really want to do that from a BaseEstimator implementation so that I could use it with GridSearchCV (it has a few parameters). Am I missing a different way to achieve sample filtration (not through BaseEstimator, but GridSearchCV compliant)? is there some way around the current API?

eickenberg · Accepted Answer

The scikit-learn transformer API is made for changing the features of the data (in nature and possibly in number/dimension), but not for changing the number of samples. Any transformer that drops or adds samples is, as of the existing versions of scikit-learn, not compliant with the API (possibly a future addition if deemed important).

So in view of this it looks like you will have to work your way around standard scikit-learn API.

sklearn: Have an estimator that filters samples

Tags:

python

scikit-learn

Korem

1 Answers

eickenberg

Recent Activity

Donate For Us

sklearn: Have an estimator that filters samples

Tags:

python

scikit-learn

Korem

1 Answers

eickenberg

Related questions

Recent Activity

Donate For Us