Backpropagation algorithm giving bad results

Tags:

I'm trying to tackle the classic handwritten digit recognition problem with a feed forward neural network and backpropagation, using the MNIST dataset. I'm using Michael Nielsen's book to learn the essentials and 3Blue1Brown's youtube video for the backpropagation algorithm.

I finished writing it some time ago and been debugging since, because the results are quite bad. At its best the network can recognize ~4000/10000 samples after 1 epoch and that number only drops on the following epochs, which lead me to believe there's some issue with the backpropagation algorithm. I've been drowning in index hell trying to debug this for the last few days and can't figure out where the issue is, I'd appreciate any help in pointing it out.

A bit of background: 1) I'm not using any matrix multiplication and no external frameworks, but doing everything with for loops because that's how I learned it from the video. 2) Unlike the book, I'm storing both weights and biases in the same array. The biases for every layer are a column at the end of the weight matrix for that layer.

And finally for the code, this is the Backpropagate method of the NeuralNetwork class, which is called in UpdateMiniBatch, which itself is called in SGD:

/// <summary>
/// Returns the partial derivative of the cost function on one sample with respect to every weight in the network.
/// </summary>
public List<double[,]> Backpropagate(ITrainingSample sample)
{
    // Forwards pass
    var (weightedInputs, activations) = GetWeightedInputsAndActivations(sample.Input);

    // The derivative with respect to the activation of the last layer is simple to compute: activation - expectedActivation
    var errors = activations.Last().Select((a, i) => a - sample.Output[i]).ToArray();

    // Backwards pass
    List<double[,]> delCostOverDelWeights = Weights.Select(x => new double[x.GetLength(0), x.GetLength(1)]).ToList();
    List<double[]> delCostOverDelActivations = Weights.Select(x => new double[x.GetLength(0)]).ToList();
    delCostOverDelActivations[delCostOverDelActivations.Count - 1] = errors;

    // Comment notation:
    // Cost function: C
    // Weight connecting the i-th neuron on the (l + 1)-th layer to the j-th neuron on the l-th layer: w[l][i, j]
    // Bias of the i-th neuron on the (l + 1)-th layer: b[l][i]
    // Activation of the i-th neuon on the l-th layer: a[l][i]
    // Weighted input of the i-th neuron on the l-th layer: z[l][i] // which doesn't make sense on layer 0, but is left for index convenience
    // Notice that weights, biases, delCostOverDelWeights and delCostOverDelActivation all start at layer 1 (the 0-th layer is irrelevant to their meanings) while activations and weightedInputs strat at the 0-th layer

    for (int l = Weights.Count - 1; l >= 0; l--)
    {
        //Calculate ∂C/∂w for the current layer:
        for (int i = 0; i < Weights[l].GetLength(0); i++)
            for (int j = 0; j < Weights[l].GetLength(1); j++)
                delCostOverDelWeights[l][i, j] = // ∂C/∂w[l][i, j]
                    delCostOverDelActivations[l][i] * // ∂C/∂a[l + 1][i]
                    SigmoidPrime(weightedInputs[l + 1][i]) * // ∂a[l + 1][i]/∂z[l + 1][i] = ∂(σ(z[l + 1][i]))/∂z[l + 1][i] = σ′(z[l + 1][i])
                    (j < Weights[l].GetLength(1) - 1 ? activations[l][j] : 1); // ∂z[l + 1][i]/∂w[l][i, j] = a[l][j] ||OR|| ∂z[l + 1][i]/∂b[l][i] = 1

        // Calculate ∂C/∂a for the previous layer(a[l]):
        if (l != 0)
            for (int i = 0; i < Weights[l - 1].GetLength(0); i++)
                for (int j = 0; j < Weights[l].GetLength(0); j++)
                    delCostOverDelActivations[l - 1][i] += // ∂C/∂a[l][i] = sum over j:
                        delCostOverDelActivations[l][j] * // ∂C/∂a[l + 1][j]
                        SigmoidPrime(weightedInputs[l + 1][j]) * // ∂a[l + 1][j]/∂z[l + 1][j] = ∂(σ(z[l + 1][j]))/∂z[l + 1][j] = σ′(z[l + 1][j])
                        Weights[l][j, i]; // ∂z[l + 1][j]/∂a[l][i] = w[l][j, i]
    }

    return delCostOverDelWeights;
}

GetWeightedInputsAndActivations:

public (List<double[]>, List<double[]>) GetWeightedInputsAndActivations(double[] input)
{
    List<double[]> activations = new List<double[]>() { input }.Concat(Weights.Select(x => new double[x.GetLength(0)])).ToList();
    List<double[]> weightedInputs = activations.Select(x => new double[x.Length]).ToList();

    for (int l = 0; l < Weights.Count; l++)
        for (int i = 0; i < Weights[l].GetLength(0); i++)
        {
            double value = 0;
            for (int j = 0; j < Weights[l].GetLength(1) - 1; j++)
                value += Weights[l][i, j] * activations[l][j];// weights
            weightedInputs[l + 1][i] = value + Weights[l][i, Weights[l].GetLength(1) - 1];// bias
            activations[l + 1][i] = Sigmoid(weightedInputs[l + 1][i]);
        }

    return (weightedInputs, activations);
}

The entire NeuralNetwork as well as everything else can be found here.

EDIT: after many significant changes to the repo the above link might no longer be functional, but should hopefully be irrelevant considering the answer. For completeness' sake this is a functional link to the changed repository.

555

asked May 29 '19 17:05

H. Saleh

1 Answers

Fixed. The issue was: I didn't divide the pixel inputs by 255. Everything else seems to work correctly, and I'm now getting +9000/10000 on the first epoch.

126

answered Oct 02 '22 11:10

H. Saleh

Related questions
                            
                                Is any ways to convert google protobuf message into json with indentation?
                            
                                Is it safe/feasible to migrate to "new project system" (PackageReference) for .NET framework solutions containing ASP.NET projects?
                            
                                How to do a RavenDB query over multiple (complex structure) fields and return the matched values?
                            
                                Creating a Map Chart in VSTO PowerPoint
                            
                                Using abstraction and dependency injection, what if implementation-specific details need to be configurable in the UI?
                            
                                Why does the MinValue for an Int32 struct in C# have two minus signs infront of it?
                            
                                Auto shuffle between windows forms every after 5 min
                            
                                Visual Studio Code debugger doesn't stop at breakpoints
                            
                                Why did my async with ContinueWith deadlock?
                            
                                How to properly build MSI Setup Projects using Azure DevOps Pipelines?
                            
                                Use Adaptive Cards Designer, then replace sample values with real ones in C#
                            
                                How to collect all MethodDeclarationSyntax transitively with Roslyn?
                            
                                Microsoft Graph The token contains no permissions, or permissions cannot be understood
                            
                                Action doesn't create a cookie when its called by external API
                            
                                How to Change ConnectionStrings at Runtime for a Web API
                            
                                How to consume scoped service with dependency in a background task
                            
                                How to fix No connection could be made because the target machine actively refused it 127.0.0.1:64527
                            
                                Trying to make epaper dynamic using C# asp.net web form
                            
                                Serilog in Azure Functions
                            
                                What is the default session expiry in .NET Core 2.x?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Backpropagation algorithm giving bad results

Tags:

c#

machine-learning

neural-network

backpropagation

H. Saleh

People also ask

1 Answers

H. Saleh

Recent Activity

Donate For Us