Neural network always predicts the same class

Tags:

I'm trying to implement a neural network that classifies images into one of the two discrete categories. The problem is, however, that it currently always predicts 0 for any input and I'm not really sure why.

Here's my feature extraction method:

def extract(file):
    # Resize and subtract mean pixel
    img = cv2.resize(cv2.imread(file), (224, 224)).astype(np.float32)
    img[:, :, 0] -= 103.939
    img[:, :, 1] -= 116.779
    img[:, :, 2] -= 123.68
    # Normalize features
    img = (img.flatten() - np.mean(img)) / np.std(img)

    return np.array([img])

Here's my gradient descent routine:

def fit(x, y, t1, t2):
    """Training routine"""
    ils = x.shape[1] if len(x.shape) > 1 else 1
    labels = len(set(y))

    if t1 is None or t2 is None:
        t1 = randweights(ils, 10)
        t2 = randweights(10, labels)

    params = np.concatenate([t1.reshape(-1), t2.reshape(-1)])
    res = grad(params, ils, 10, labels, x, y)
    params -= 0.1 * res

    return unpack(params, ils, 10, labels)

Here are my forward and back(gradient) propagations:

def forward(x, theta1, theta2):
    """Forward propagation"""

    m = x.shape[0]

    # Forward prop
    a1 = np.vstack((np.ones([1, m]), x.T))
    z2 = np.dot(theta1, a1)

    a2 = np.vstack((np.ones([1, m]), sigmoid(z2)))
    a3 = sigmoid(np.dot(theta2, a2))

    return (a1, a2, a3, z2, m)

def grad(params, ils, hls, labels, x, Y, lmbda=0.01):
    """Compute gradient for hypothesis Theta"""

    theta1, theta2 = unpack(params, ils, hls, labels)

    a1, a2, a3, z2, m = forward(x, theta1, theta2)
    d3 = a3 - Y.T
    print('Current error: {}'.format(np.mean(np.abs(d3))))

    d2 = np.dot(theta2.T, d3) * (np.vstack([np.ones([1, m]), sigmoid_prime(z2)]))
    d3 = d3.T
    d2 = d2[1:, :].T

    t1_grad = np.dot(d2.T, a1.T)
    t2_grad = np.dot(d3.T, a2.T)

    theta1[0] = np.zeros([1, theta1.shape[1]])
    theta2[0] = np.zeros([1, theta2.shape[1]])

    t1_grad = t1_grad + (lmbda / m) * theta1
    t2_grad = t2_grad + (lmbda / m) * theta2

    return np.concatenate([t1_grad.reshape(-1), t2_grad.reshape(-1)])

And here's my prediction function:

def predict(theta1, theta2, x):
    """Predict output using learned weights"""
    m = x.shape[0]

    h1 = sigmoid(np.hstack((np.ones([m, 1]), x)).dot(theta1.T))
    h2 = sigmoid(np.hstack((np.ones([m, 1]), h1)).dot(theta2.T))

    return h2.argmax(axis=1)

I can see that the error rate is gradually decreasing with each iteration, generally converging somewhere around 1.26e-05.

What I've tried so far:

PCA
Different datasets (Iris from sklearn and handwritten numbers from Coursera ML course, achieving about 95% accuracy on both). However, both of those were processed in a batch, so I can assume that my general implementation is correct, but there is something wrong with either how I extract features, or how I train the classifier.
Tried sklearn's SGDClassifier and it didn't perform much better, giving me a ~50% accuracy. So something wrong with the features, then?

Edit: An average output of h2 looks like the following:

[0.5004899   0.45264441]
[0.50048522  0.47439413]
[0.50049019  0.46557124]
[0.50049261  0.45297816]

So, very similar sigmoid outputs for all validation examples.

901

asked Jan 05 '17 15:01

Yurii Dolhikh

4 Answers

My network does always predict the same class. What is the problem?

I had this a couple of times. Although I'm currently too lazy to go through your code, I think I can give some general hints which might also help others who have the same symptom but probably different underlying problems.

Debugging Neural Networks

Fitting one item datasets

For every class i the network should be able to predict, try the following:

Create a dataset of only one data point of class i.
Fit the network to this dataset.
Does the network learn to predict "class i"?

If this doesn't work, there are four possible error sources:

Buggy training algorithm: Try a smaller model, print a lot of values which are calculated in between and see if those match your expectation.
1. Dividing by 0: Add a small number to the denominator
2. Logarithm of 0 / negativ number: Like dividing by 0
Data: It is possible that your data has the wrong type. For example, it might be necessary that your data is of type float32 but actually is an integer.
Model: It is also possible that you just created a model which cannot possibly predict what you want. This should be revealed when you try simpler models.
Initialization / Optimization: Depending on the model, your initialization and your optimization algorithm might play a crucial role. For beginners who use standard stochastic gradient descent, I would say it is mainly important to initialize the weights randomly (each weight a different value). - see also: this question / answer

Learning Curve

See sklearn for details.

Learning Curve showing the training error / test error curves to approach each other

The idea is to start with a tiny training dataset (probably only one item). Then the model should be able to fit the data perfectly. If this works, you make a slightly larger dataset. Your training error should slightly go up at some point. This reveals your models capacity to model the data.

Data analysis

Check how often the other class(es) appear. If one class dominates the others (e.g. one class is 99.9% of the data), this is a problem. Look for "outlier detection" techniques.

Learning rate: If your network doesn't improve and get only slightly better than random chance, try reducing the learning rate. For computer vision, a learning rate of 0.001 is often used / working. This is also relevant if you use Adam as an optimizer.
Preprocessing: Make sure you use the same preprocessing for training and testing. You might see differences in the confusion matrix (see this question)

Common Mistakes

This is inspired by reddit:

You forgot to apply preprocessing
Dying ReLU
Too small / too big learning rate
Wrong activation function in final layer:
- Your targets are not in sum one? -> Don't use softmax
- Single elements of your targets are negative -> Don't use Softmax, ReLU, Sigmoid. tanh might be an option
Too deep network: You fail to train. Try a simpler neural network first.
Vastly unbalanced data: You might want to look into imbalanced-learn

answered Sep 27 '22 23:09

Martin Thoma

After a week and a half of research I think I understand what the issue is. There is nothing wrong with the code itself. The only two issues that prevent my implementation from classifying successfully are time spent learning and proper selection of learning rate / regularization parameters.

I've had the learning routine running for some tome now, and it's pushing 75% accuracy already, though there is still plenty of space for improvement.

answered Sep 27 '22 23:09

Yurii Dolhikh

Same happened to me. I had an imbalanced dataset (about 66%-33% sample distribution between classes 0 and 1, respectively) and the net was always outputting 0.0 for all samples after the first iteration.

My problem was simply a too high learning rate. Switching it to 1e-05 solved the issue.

More generally, what I suggest to do is to print, before the parameters' update:

your net output (for one batch)
the corresponding label (for the same batch)
the value of the loss (on the same batch) either sample by sample or aggregated.

And then check the same three items after the parameter update. What you should see in the next batch is a gradual change in the net output. When my learning rate was too high, already in the second iteration the net output would shoot to either all 1.0s or all 0.0s for all samples in the batch.

answered Sep 28 '22 00:09

Tommaso Di Noto

Same happened to me. Mine was in deeplearning4j JAVA library for image classification.It kept on giving the final output of the last training folder for every test. I was able to solve it by decreasing the learning rate.

Approaches can be used :

Lowering the learning rate. (First mine was 0.01 - lowering to 1e-4 and it worked)
Increasing Batch Size (Sometimes stochastic gradient descent doesn't work then you can try giving more batch size(32,64,128,256,..)
Shuffling the training Data

answered Sep 28 '22 00:09

Urmay Shah

Related questions
                            
                                "synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'." problem in TensorFlow
                            
                                Working with Anaconda in Visual Studio Code
                            
                                How do I install Keras and Theano in Anaconda Python on Windows?
                            
                                Clearly documented reading of emails functionality with python win32com outlook
                            
                                Python 3 - Can pickle handle byte objects larger than 4GB?
                            
                                Getting values from functions that run as asyncio tasks
                            
                                Python 3.7 - asyncio.sleep() and time.sleep()
                            
                                How to read numbers from file in Python?
                            
                                pandas multiindex - how to select second level when using columns?
                            
                                tf.shape() get wrong shape in tensorflow
                            
                                How to encode text to base64 in python
                            
                                Subprocess check_output returned non-zero exit status 1
                            
                                How to make an object properly hashable?
                            
                                A better way for a Python 'for' loop
                            
                                Why does Python assignment not return a value?
                            
                                Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed
                            
                                Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]
                            
                                Why is adding attributes to an already instantiated object allowed?
                            
                                Not able to create super user with Django manage.py
                            
                                Rounding decimals with new Python format function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural network always predicts the same class

Tags:

python-3.x

neural-network

numpy

deep-learning

gradient-descent

Yurii Dolhikh

People also ask

4 Answers

Debugging Neural Networks

Fitting one item datasets

Learning Curve

Data analysis

More

Common Mistakes

Martin Thoma

Yurii Dolhikh

Tommaso Di Noto

Urmay Shah

Recent Activity

Donate For Us