In Keras, why is it that input_shape
does not include the batch dimension when passed as an argument to layers like Dense
but DOES include the batch dimension when input_shape
is passed to the build
method of a model?
import tensorflow as tf
from tensorflow.keras.layers import Dense
if __name__ == "__main__":
model1 = tf.keras.Sequential([Dense(1, input_shape=[10])])
model1.summary()
model2 = tf.keras.Sequential([Dense(1)])
model2.build(input_shape=[None, 10]) # why [None, 10] and not [10]?
model2.summary()
Is this a conscious choice of API design? If it is, why?
“None” tells that any batch size will be accepted. Set to None, then the bs is not bounded by a specific number. Params means each layer's trainable and non-trainable parameters.
The input shape In Keras, the input layer itself is not a layer, but a tensor. It's the starting tensor you send to the first hidden layer. This tensor must have the same shape as your training data. Example: if you have 30 images of 50x50 pixels in RGB (3 channels), the shape of your input data is (30,50,50,3) .
This function is used to create fully connected layers, in which every output depends on every input. Parameters: This function takes the args object as a parameter which can have the following properties: units: It is a positive number that defines the dimensionality of the output space.
input_shape : Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
You can specify the input shape of your model in several different ways. For example by providing one of the following arguments to the first layer of your model:
batch_input_shape
: A tuple where the first dimension is the batch size.input_shape
: A tuple that does not include the batch size, e.g., the batch size is assumed to be None
or batch_size
, if specified.input_dim
: A scalar indicating the dimension of the input.In all these cases, Keras is internally storing an attribute _batch_input_size
to build the model.
Regarding the build
method, my guess is that this is indeed a conscious choice - information about the batch size might be useful to build the model in some (perhaps unthought-of) situations. Therefore, a framework that includes the batch dimension as input to build
is more generic and complete than a framework that doesn't. Nonetheless, I agree with you that naming the argument batch_input_shape
instead of input_shape
would make everything more consistent.
It is also worth mentioning that users rarely need to call the build
method by themselves. This happens internally when it is needed. Nowadays, it is even possible to ignore the input_shape
argument when creating the model (although methods like summary
will then not work until the model is built). In this case, Keras is able to infer the input shape from the argument x
of fit
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With