Based on the Coursera Course for Machine Learning, I'm trying to implement the cost function for a neural network in python. There is a question similar to this one -- with an accepted answer -- but the code in that answers is written in octave. Not to be lazy, I have tried to adapt the relevant concepts of the answer to my case, and as far as I can tell, I'm implementing the function correctly. The cost I output differs from the expected cost, however, so I'm doing something wrong.
Here's a small reproducible example:
The following link leads to an .npz
file which can be loaded (as below) to obtain relevant data. Rename the file "arrays.npz"
please, if you use it.
http://www.filedropper.com/arrays_1
if __name__ == "__main__":
with np.load("arrays.npz") as data:
thrLayer = data['thrLayer'] # The final layer post activation; you
# can derive this final layer, if verification needed, using weights below
thetaO = data['thetaO'] # The weight array between layers 1 and 2
thetaT = data['thetaT'] # The weight array between layers 2 and 3
Ynew = data['Ynew'] # The output array with a 1 in position i and 0s elsewhere
#class i is the class that the data described by X[i,:] belongs to
X = data['X'] #Raw data with 1s appended to the first column
Y = data['Y'] #One dimensional column vector; entry i contains the class of entry i
import numpy as np
m = len(thrLayer)
k = thrLayer.shape[1]
cost = 0
for i in range(m):
for j in range(k):
cost += -Ynew[i,j]*np.log(thrLayer[i,j]) - (1 - Ynew[i,j])*np.log(1 - thrLayer[i,j])
print(cost)
cost /= m
'''
Regularized Cost Component
'''
regCost = 0
for i in range(len(thetaO)):
for j in range(1,len(thetaO[0])):
regCost += thetaO[i,j]**2
for i in range(len(thetaT)):
for j in range(1,len(thetaT[0])):
regCost += thetaT[i,j]**2
regCost *= lam/(2*m)
print(cost)
print(regCost)
In actuality, cost
should be 0.287629 and cost + newCost
should be 0.383770.
This is the cost function posted in the question above, for reference:
The problem is that you are using the wrong class labels. When computing the cost function, you need to use the ground truth, or the true class labels.
I'm not sure what your Ynew array, was, but it wasn't the training outputs. So, I changed your code to use Y for the class labels in the place of Ynew, and got the correct cost.
import numpy as np
with np.load("arrays.npz") as data:
thrLayer = data['thrLayer'] # The final layer post activation; you
# can derive this final layer, if verification needed, using weights below
thetaO = data['thetaO'] # The weight array between layers 1 and 2
thetaT = data['thetaT'] # The weight array between layers 2 and 3
Ynew = data['Ynew'] # The output array with a 1 in position i and 0s elsewhere
#class i is the class that the data described by X[i,:] belongs to
X = data['X'] #Raw data with 1s appended to the first column
Y = data['Y'] #One dimensional column vector; entry i contains the class of entry i
m = len(thrLayer)
k = thrLayer.shape[1]
cost = 0
Y_arr = np.zeros(Ynew.shape)
for i in xrange(m):
Y_arr[i,int(Y[i,0])-1] = 1
for i in range(m):
for j in range(k):
cost += -Y_arr[i,j]*np.log(thrLayer[i,j]) - (1 - Y_arr[i,j])*np.log(1 - thrLayer[i,j])
cost /= m
'''
Regularized Cost Component
'''
regCost = 0
for i in range(len(thetaO)):
for j in range(1,len(thetaO[0])):
regCost += thetaO[i,j]**2
for i in range(len(thetaT)):
for j in range(1,len(thetaT[0])):
regCost += thetaT[i,j]**2
lam=1
regCost *= lam/(2.*m)
print(cost)
print(cost + regCost)
This outputs:
0.287629165161
0.383769859091
Edit: Fixed an integer division error with regCost *= lam/(2*m)
that was zeroing out the regCost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With