I have been studying this example of stacking. In this case, each set of K-folds produces one column of data, and this is repeated for each classifier. I.e: the matrices for blending are:
dataset_blend_train = np.zeros((X.shape[0], len(clfs)))
dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs)))
I need to stack predictions from a multiclass problem (probs 15 different classes per sample). This will produce an n*15 matrix for each clf.
Should these matrices just be concatenated horizontally? Or should they be combined in some other way, before logistic regression is applied? Thanks.
You can adapt the code to the multi-class problem in two ways:
dataset_blend_train = np.zeros((X.shape[0], len(clfs)*numOfClasses))
dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs)*numOfClasses))
predict_proba
you just use predict
. I have used both successfully, but which works better may depend on the dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With