Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SGDClassifier with predict_proba

I'm using sklearn libraries for training and testing my data.

targetDataCsv =  pd.read_csv("target.csv","rt"))
testNormalizedCsv = csv.reader(open("testdf_new.csv","rt",encoding="utf-8"))
traningNormalizedCsv = pd.read_csv("traindf_new.csv", skiprows=1,nrows=99999)
df = pd.read_csv("testdf_new.csv", skiprows=1, nrows=9999)

I wanted to use partial_fit method of SGDClassifier since my training data has more than 200000 rows.

 X = traningNormalizedCsv.values
 y = targetDataCsv.values   
 clf = SGDClassifier()
 clf.partial_fit(X, y)

But this classifier does not have predict_proba method to get the target probability for my test data.

   clf.predict_proba(df.values)

Please suggest.

like image 456
rathna Avatar asked Sep 10 '25 18:09

rathna


1 Answers

As you can see in doc - This method is only available for log loss and modified Huber loss.

So you have to change your loss function.

from sklearn.linear_model import SGDClassifier
import numpy as np
X = np.random.random_sample((1000,3))
y = np.random.binomial(3, 0.5, 1000)
model = SGDClassifier(loss="modified_huber")
model.partial_fit(X, y, classes=np.unique(y))
print(model.predict_proba([[0.5,0.6,0.7]]))

output for example: [[ 0. 0. 1. 0.]]

like image 186
Anton Alekseev Avatar answered Sep 13 '25 08:09

Anton Alekseev