I beieve SGDClassifier() with loss='log' supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this
Now, my dataset is quite big and I am using HashingVectorizer and passing result as input to SGDClassifier. My target has 42048 features.
When I run this, as follows:
clf.partial_fit(X_train_batch, y)
I get: ValueError: bad input shape (300000, 42048).
I have also used classes as the parameter as follows, but still same problem.
clf.partial_fit(X_train_batch, y, classes=np.arange(42048))
In the documentation of SGDClassifier, it says y : numpy array of shape [n_samples]
No, SGDClassifier does not do multilabel classification -- it does multiclass classification, which is a different problem, although both are solved using a one-vs-all problem reduction.
Then, neither SGD nor OneVsRestClassifier.fit will accept a sparse matrix for y. The former wants an array of labels, as you've already found out. The latter wants, for multilabel purposes, a list of lists of labels, e.g.
y = [[1], [2, 3], [1, 3]]
to denote that X[0] has label 1, X[1] has labels {2,3} and X[2] has labels {1,3}.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With