Logo Questions Linux Laravel Mysql Ubuntu Git Menu

partial_fit Sklearn's MLPClassifier

I've been trying to use Sklearn's neural network MLPClassifier. I have a dataset that is of size 1000 instances (with binary outputs) and I want to apply a basic Neural Net with 1 hidden layer to it.

The issue is that my data instances are not available all at the same time. At any point in time, I only have access to 1 data instance. I thought that partial_fit method of MLPClassifier can be used for this so I simulated the problem with an imaginary dataset of 1000 inputs and looped over the inputs one at a time and partial_fit to each instance but when I run the code, the neural net learns nothing and the predicted output is all zeros.

I am clueless as to what might be causing the problem. Any thought is hugely appreciated.

from __future__ import division 
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier

#Creating an imaginary dataset
input, output = make_classification(1000, 30, n_informative=10, n_classes=2)
input= input / input.max(axis=0)
N = input.shape[0]
train_input = input[0:N/2,:]
train_target = output[0:N/2]

test_input= input[N/2:N,:]
test_target = output[N/2:N]

#Creating and training the Neural Net
clf = MLPClassifier(activation='tanh', algorithm='sgd', learning_rate='constant',
 alpha=1e-4, hidden_layer_sizes=(15,), random_state=1, batch_size=1,verbose= True,
 max_iter=1, warm_start=True)
for j in xrange(0,100):
for i in xrange(0,train_input.shape[0]):
    input_inst = [train_input[i,:]]
    input_inst = np.asarray(input_inst)
    target_inst= [train_target[i]]
    target_inst = np.asarray(target_inst)

#Testing the Neural Net
y_pred = clf.predict(test_input)
print y_pred
like image 887
Bita Avatar asked Mar 02 '16 19:03


1 Answers

Explanation of the problem

The problem is with self.label_binarizer_.fit(y) in line 895 in multilayer_perceptron.py.

Whenever you call clf.partial_fit(input_inst,target_inst,classes), you call self.label_binarizer_.fit(y) where y has only one sample corresponding to one class, in this case. Therefore, if the last sample is of class 0, then your clf will classify everything as class 0.


As a temporary fix, you can edit multilayer_perceptron.py at line 895. It is found in a directory similar to this python2.7/site-packages/sklearn/neural_network/

At line 895, change,



if not incremental:


That way, if you are using partial_fit, then self.label_binarizer_ fits on the classes rather than on the individual sample.

Further, the code you posted can be changed to the following to make it work,

from __future__ import division 
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier

#Creating an imaginary dataset
input, output = make_classification(1000, 30, n_informative=10, n_classes=2)
input= input / input.max(axis=0)
N = input.shape[0]
train_input = input[0:N/2,:]
train_target = output[0:N/2]

test_input= input[N/2:N,:]
test_target = output[N/2:N]

#Creating and training the Neural Net 
# 1. Disable verbose (verbose is annoying with partial_fit)

clf = MLPClassifier(activation='tanh', algorithm='sgd', learning_rate='constant',
 alpha=1e-4, hidden_layer_sizes=(15,), random_state=1, batch_size=1,verbose= False,
 max_iter=1, warm_start=True)

# 2. Set what the classes are
clf.classes_ = [0,1]

for j in xrange(0,100):
    for i in xrange(0,train_input.shape[0]):
       input_inst = train_input[[i]]
       target_inst= train_target[[i]]


    # 3. Monitor progress
    print "Score on training set: %0.8f" % clf.score(train_input, train_target)
#Testing the Neural Net
y_pred = clf.predict(test_input)
print y_pred

# 4. Compute score on testing set
print clf.score(test_input, test_target)

There are 4 main changes in the code. This should give you a good prediction on both the training and the testing set!


like image 140
IssamLaradji Avatar answered Oct 18 '22 20:10
