what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
and
tf.keras.layers.Softmax(10)
Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf. reduce_sum(exp(x)) . The input values in are the log-odds of the resulting probability. Arguments.
Activation functions are a critical part of the design of a neural network. The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.
Softmax is a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector.
Keras dense layer on the output layer performs dot product of input tensor and weight kernel matrix. A bias vector is added and element-wise activation is performed on output values.
they are the same, you can test it on your own
# generate data
x = np.random.uniform(0,1, (5,20)).astype('float32')
# 1st option
X = Dense(10, activation=tf.nn.softmax)
A = X(x)
# 2nd option
w,b = X.get_weights()
B = Softmax()(tf.matmul(x,w) + b)
tf.reduce_all(A == B)
# <tf.Tensor: shape=(), dtype=bool, numpy=True>
Pay attention also when using tf.keras.layers.Softmax
, it doesn't require to specify the units, it's a simple activation
by default, the softmax is computed on the -1 axis, you can change this if you have tensor outputs > 2D and want to operate softmax on other dimensionalities. You can change this easily in the second option
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With