I have a simple NN model for detecting hand-written digits from a 28x28px image written in python using Keras (Theano backend):
model0 = Sequential()
#number of epochs to train for
nb_epoch = 12
#amount of data each iteration in an epoch sees
batch_size = 128
model0.add(Flatten(input_shape=(1, img_rows, img_cols)))
model0.add(Dense(nb_classes))
model0.add(Activation('softmax'))
model0.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model0.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
score = model0.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
This runs well and I get ~90% accuracy. I then perform the following command to get a summary of my network's structure by doing print(model0.summary())
. This outputs the following:
Layer (type) Output Shape Param # Connected to
=====================================================================
flatten_1 (Flatten) (None, 784) 0 flatten_input_1[0][0]
dense_1 (Dense) (None, 10) 7850 flatten_1[0][0]
activation_1 (None, 10) 0 dense_1[0][0]
======================================================================
Total params: 7850
I don't understand how they get to 7850 total params and what that actually means?
Model summary summary() to print a useful summary of the model, which includes: Name and type of all layers in the model. Output shape for each layer. Number of weight parameters of each layer.
From the definition of Keras documentation the Sequential model is a linear stack of layers.You can create a Sequential model by passing a list of layer instances to the constructor: from keras.models import Sequential from keras.layers import Dense, Activation model = Sequential([ Dense(32, input_shape=(784,)), ...
Defining models and layers in TensorFlow. Most models are made of layers. Layers are functions with a known mathematical structure that can be reused and have trainable variables. In TensorFlow, most high-level implementations of layers and models, such as Keras or Sonnet, are built on the same foundational class: tf.
The number of parameters is 7850 because with every hidden unit you have 784 input weights and one weight of connection with bias. This means that every hidden unit gives you 785 parameters. You have 10 units so it sums up to 7850.
The role of this additional bias term is really important. It significantly increases the capacity of your model. You can read details e.g. here Role of Bias in Neural Networks.
I feed a 514 dimensional real-valued input to a Sequential
model in Keras.
My model is constructed in following way :
predictivemodel = Sequential()
predictivemodel.add(Dense(514, input_dim=514, W_regularizer=WeightRegularizer(l1=0.000001,l2=0.000001), init='normal'))
predictivemodel.add(Dense(257, W_regularizer=WeightRegularizer(l1=0.000001,l2=0.000001), init='normal'))
predictivemodel.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
When I print model.summary()
I get following result:
Layer (type) Output Shape Param # Connected to
================================================================
dense_1 (Dense) (None, 514) 264710 dense_input_1[0][0]
________________________________________________________________
activation_1 (None, 514) 0 dense_1[0][0]
________________________________________________________________
dense_2 (Dense) (None, 257) 132355 activation_1[0][0]
================================================================
Total params: 397065
________________________________________________________________
For the dense_1 layer , number of params is 264710. This is obtained as : 514 (input values) * 514 (neurons in the first layer) + 514 (bias values)
For dense_2 layer, number of params is 132355. This is obtained as : 514 (input values) * 257 (neurons in the second layer) + 257 (bias values for neurons in the second layer)
For Dense Layers:
output_size * (input_size + 1) == number_parameters
For Conv Layers:
output_channels * (input_channels * window_size + 1) == number_parameters
Consider following example,
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
Conv2D(64, (3, 3), activation='relu'),
Conv2D(128, (3, 3), activation='relu'),
Dense(num_classes, activation='softmax')
])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 222, 222, 32) 896
_________________________________________________________________
conv2d_2 (Conv2D) (None, 220, 220, 64) 18496
_________________________________________________________________
conv2d_3 (Conv2D) (None, 218, 218, 128) 73856
_________________________________________________________________
dense_9 (Dense) (None, 218, 218, 10) 1290
=================================================================
Calculating params,
assert 32 * (3 * (3*3) + 1) == 896
assert 64 * (32 * (3*3) + 1) == 18496
assert 128 * (64 * (3*3) + 1) == 73856
assert num_classes * (128 + 1) == 1290
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With