How can I train the model to recognize five numbers in one picture. The code is as follows:
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dropout, Dense, Input
from keras.models import Model, Sequential
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=(28, 140, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dropout(0.5))
Here should be a loop for recognizing each number in the picture, but I don't know how to realize it.
model.add(Dense(11, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=1000,
epochs=8,
verbose=1,
validation_data=(X_valid, y_valid))
The picture of combined mnist number is as follows:
The classic work in this area is 'Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks'
Keras model (functional, not sequential):
inputs = Input(shape=(28, 140, 1), name="input")
x = inputs
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 140, 1))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
digit1 = Dense(10, activation='softmax', name='digit1')(x)
digit2 = Dense(10, activation='softmax', name='digit2')(x)
digit3 = Dense(10, activation='softmax', name='digit3')(x)
digit4 = Dense(10, activation='softmax', name='digit4')(x)
digit5 = Dense(10, activation='softmax', name='digit5')(x)
predictions = [digit1,digit2,digit3,digit4,digit5]
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=Adam(), metrics=['accuracy'], oss='categorical_crossentropy')
PS You may use 11 classes for 10 digits and empty space.
I suggest two possible approaches:
Case 1- The images are nicely structured.
In the example you provided, this is indeed the case, so if your data looks like in the link you provided, I will suggest this approach.
In the link you provided, every image basically consists of 5 28-by-28 pixeled images stacked together. In this case, I would suggest to cut the images (that is, cut each image into 5 pieces), and train your model as with a usual MNIST data (for example, using the code you provided). Then, when you want to apply your model to classify new data, just cut each new image into 5 pieces as well. Classify each one of these 5 pieces using your model, and then just write these 5 numbers right next to the other as an output.
so regarding this sentence:
Here should be a loop for recognizing each number in the picture, but I don't know how to realize it
you don't need a for
loop. Just cut your figures.
Case 2- The images are not nicely structured.
In this case, each image is labeled with 5 numbers. So each row in y_train
and y_valid
) will be a 0,1-vector with 55 entries.
The first 11 entries is the one-hot encoding of the first number, the second 11 entries is the one-hot encoding of the second number and so on. So each row in y_train
will have 5 entries equal 1, and the rest equal 0.
In addition, instead of using softmax activation on the output layer and categorical_crossentropy
loss, use sigmoid activation function and 'binary_crossentropy' loss (see further discussion about the reasons here and here)
To summarize, replace this:
model.add(Dense(11, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
with this:
model.add(Dense(55, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adadelta())
Since you already have a very well behaved image, all you have to do is expand the number of classes in your model.
You can use 5 times 11 classes instead of using just 11 classes.
The first 11 classes identify the first number, the following 11 classes identify the second number and so on. A total of 55 classes, 11 classes for each position in the image.
So, in short:
(28,140)
, or (140,28)
, depending on which methods you're using to load the images. (55,)
, telling which numbers are in each quadrant. Example: for the first image, with 9,7,5,4,10, you'd create Y_training
with the following positions containing the value 1:
Y_training[9] = 1
Y_training[18] = 1 #(18=7+11)
Y_training[27] = 1 #(27=5+22)
Y_training[37] = 1 #(37=4+33)
Y_training[54] = 1 #(54=10+44)
Create your model layers the way you want, pretty much the same as a regular MNIST model, that means: no need to try loops or things like that.
But it will probably need to be a little bigger than before.
You will not be able to use categorical_crossentropy
anymore, sice you will have 5 correct classes per image instead of just 1. If you're using "sigmoid" activations at the end, binary_crossentropy
should be a good replacement.
Make sure your last layer fits the 55-element vector, such as Dense(55)
, for instance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With