I am trying to implement mini batch training to my neural network instead of the "online" stochastic method of updating weights every training sample.
I have developed a somewhat novice neural network in C whereby i can adjust the number of neurons in each layer , activation functions etc. This is to help me understand neural networks. I have trained the network on mnist data set but it takes around 200 epochs to get down do an error rate of 20% on the training set which seams very poor to me. I am currently using online stochastic gradient decent to train the network. What i would like to try is use mini batches instead. I understand the concept that i must accumulate and average the error from each training sample before i propagate the error back. My problem comes in when i want to calculate the changes i must make to the weights. To explain this better consider a very simple perceptron model. One input, one hidden layer one output. To calculate the change i need to make to the weight between the input and the hidden unit i will use this following equation:
∂C/∂w1= ∂C/∂O*∂O/∂h*∂h/∂w1
If you do the partial derivatives you get:
∂C/∂w1= (Output-Expected Answer)(w2)(input)
Now this formula says that you need to multiply the back propogated error by the input. For online stochastic training that makes sense because you use 1 input per weight update. For minibatch training you used many inputs so which input does the error get multiplied by? I hope you can assist me with this.
void propogateBack(void){
    //calculate 6C/6G
    for (count=0;count<network.outputs;count++){
            network.g_error[count] = derive_cost((training.answer[training_current])-(network.g[count]));
    }
    //calculate 6G/6O
    for (count=0;count<network.outputs;count++){
        network.o_error[count] = derive_activation(network.g[count])*(network.g_error[count]);
    }
    //calculate 6O/6S3
    for (count=0;count<network.h3_neurons;count++){
        network.s3_error[count] = 0;
        for (count2=0;count2<network.outputs;count2++){
            network.s3_error[count] += (network.w4[count2][count])*(network.o_error[count2]);
        }
    }
    //calculate 6S3/6H3
    for (count=0;count<network.h3_neurons;count++){
        network.h3_error[count] = (derive_activation(network.s3[count]))*(network.s3_error[count]);
    }
    //calculate 6H3/6S2
    network.s2_error[count] = = 0;
    for (count=0;count<network.h2_neurons;count++){
        for (count2=0;count2<network.h3_neurons;count2++){ 
            network.s2_error[count] = += (network.w3[count2][count])*(network.h3_error[count2]);
        }
    }
    //calculate 6S2/6H2
    for (count=0;count<network.h2_neurons;count++){
        network.h2_error[count] = (derive_activation(network.s2[count]))*(network.s2_error[count]);
    }
    //calculate 6H2/6S1
    network.s1_error[count] = 0;
    for (count=0;count<network.h1_neurons;count++){
        for (count2=0;count2<network.h2_neurons;count2++){
            buffer += (network.w2[count2][count])*network.h2_error[count2];
        }
    }
    //calculate 6S1/6H1
    for (count=0;count<network.h1_neurons;count++){
        network.h1_error[count] = (derive_activation(network.s1[count]))*(network.s1_error[count]);
    }
}
void updateWeights(void){
    //////////////////w1
    for(count=0;count<network.h1_neurons;count++){
        for(count2=0;count2<network.inputs;count2++){
            network.w1[count][count2] -= learning_rate*(network.h1_error[count]*network.input[count2]);
        }
    }
    //////////////////w2
    for(count=0;count<network.h2_neurons;count++){
        for(count2=0;count2<network.h1_neurons;count2++){
            network.w2[count][count2] -= learning_rate*(network.h2_error[count]*network.s1[count2]);
        }
    }
    //////////////////w3
    for(count=0;count<network.h3_neurons;count++){
        for(count2=0;count2<network.h2_neurons;count2++){
            network.w3[count][count2] -= learning_rate*(network.h3_error[count]*network.s2[count2]);
        }
    }
    //////////////////w4
    for(count=0;count<network.outputs;count++){
        for(count2=0;count2<network.h3_neurons;count2++){
            network.w4[count][count2] -= learning_rate*(network.o_error[count]*network.s3[count2]);
        }
    }
}
The code i have attached is how i do the online stochastic updates. As you can see in the updateWeights() function the weight updates are based on the input values (dependent on the sample fed in) and the hidden unit values (also dependent on the input sample value fed in). So when i have the minibatch average gradient that i am propogating back how will i update the weights? which input values do i use?
Ok so i figured it out. When using mini batches you should not accumulate and average out the error at the output of the network. Each training examples error gets propogated back as you would normally except instead of updating the weights you accumulate the changes you would have made to each weight. When you have looped through the mini batch you then average the accumulations and change the weights accordingly.
I was under the impression that when using mini batches you do not have to propogate any error back until you have looped through the mini batch. I was wrong you still need to do that the only difference is you only update the weights once you have looped through your mini batch size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With