I'm trying to make a simple CNN classifier model. For my training images (BATCH_SIZEx227x227x1) and labels (BATCH_SIZEx7) datasets, I'm using numpy ndarrays that are fed to the model in batches via ImageDataGenerator
. The loss function I'm using is tf.nn.sparse_categorical_crossentropy. The problem arises when the model tries to train; the model (batch size here is 1 for my simplified experimentations) outputs a shape of [1, 7] and labels is shape [7].
I'm almost positive I know the cause of this, but I am unsure how to fix it. My hypothesis is that sparse_categorical_crossentropy is squeezing the dimensions of my labels (e.g. when BATCH_SIZE is 2, the input, ground-truth label shape is squeezed from [2, 7] to [14]), making it impossible for me to fix the label shape, and all my attempts to fix logits shape have been fruitless.
I originally tried fixing labels shape with np.expand_dims
. But the loss function always flattens the labels, no matter how I expand the dimensions.
Following that, I tried adding a tf.keras.layers.Flatten()
at the end of my model to get rid of the extraneous first dimension, but it had no effect; I still got the same exact error.
Following that, tried using tf.keras.layers.Reshape((-1,))
to squeeze all the dimensions. However, that resulted in a different error:
in sparse_categorical_crossentropy logits = array_ops.reshape(output, [-1, int(output_shape[-1])]) TypeError: int returned non-int (type NoneType)
Question: How can I squash the shape of the logits to be the same shape as the labels returned by the sparse_categorical_crossentropy?
### BUILD SHAPE OF THE MODEL ###
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu,
input_shape=(227,227,1)),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(7, activation=tf.nn.softmax), # final layer with node for each classification
#tf.keras.layers.Reshape((-1,))
])
# specify loss and SGD functions
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
### TRAIN THE MODEL ###
#specify training metadata
BATCH_SIZE = 1
print("about to train")
# train the model on the training data
num_epochs = 1
model.fit_generator(generator.flow(train_images, train_labels, batch_size=BATCH_SIZE), epochs=num_epochs)
--- full error trace ---
Traceback (most recent call last):
File "classifier_model.py", line 115, in <module>
model.fit_generator(generator.flow(train_images, train_labels, batch_size=BATCH_SIZE), epochs=num_epochs)
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1426, in fit_generator
initial_epoch=initial_epoch)
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 191, in model_iteration
batch_outs = batch_function(*batch_data)
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1191, in train_on_batch
outputs = self._fit_function(ins) # pylint: disable=not-callable
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/Users/grammiegramco/Desktop/projects/HiRISE/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [1,7] and labels shape [7]
[[{{node loss/dense_1_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
No, you got the cause all wrong. You are giving one-hot encoded labels, but sparse_categorical_crossentropy
expects integer labels, as it does the one-hot encoding itself (hence, sparse).
An easy solution would be to change loss to categorical_crossentropy
, not the sparse version. Also note that y_true
with shape (7,) is incorrect, it should be (1, 7).
please consider adding a flatten layer before all the dense layers. I had the same exact issues as you and had to change from categorical_crossentropy to sparse_categorical_crossentropy. Since sprarse_categorical_crossentropy involves one-hot-encoding, your array needs to be of lesser (2D) array from the 4D array that is the output of the CNN layers.
this fixed the issue for me!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With