Estimating high resolution images from lower ones using a Keras model based on ConvLSTM2D

Question

I'm trying to use the following ConvLSTM2D architecture to estimate high resolution image sequences from low resolution ones:

import numpy as np, scipy.ndimage, matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, ConvLSTM2D, MaxPooling2D, UpSampling2D
from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score
from sklearn.preprocessing import MinMaxScaler, StandardScaler
np.random.seed(123)

raw = np.arange(96).reshape(8,3,4)
data1 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=1, mode='nearest') #low res
print (data1.shape)
#(8, 300, 400)

data2 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=3, mode='nearest') #high res
print (data2.shape)
#(8, 300, 400)

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)
#(samples,time, rows, cols, channels)

model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
#samples, time, rows, cols, channels
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))     
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

print (model.summary())

model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X_train, Y_train, 
          batch_size=1, epochs=10, verbose=1)

x,y = model.evaluate(X_train, Y_train, verbose=0)
print (x,y)

This declaration will result in the following Value error:

ValueError: Input 0 is incompatible with layer conv_lst_m2d_2: expected ndim=5, found ndim=4

How can I correct this ValueError? I think problem is with input shapes, but could not figure out what exactly is wrong.
Notice that the output should be sequences of images too, instead of a classification result.

ldavid · Accepted Answer

This is happening because LSTMs require temporal data, but your first one was declared as a many-to-one model, which outputs a tensor of shape (batch_size, 300, 400, 16). That is, batches of images:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))     
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You want the output to be a tensor of shape (batch_size, 8, 300, 400, 16) (i.e. sequences of images), so they can be consumed by the second LSTM. The way to fix this is to add return_sequences in the first LSTM definition:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You mentioned classification. If what you indent is to classify entire sequences, then you need a classifier at the end:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))
model.add(GlobalAveragePooling2D())
model.add(Dense(10, activation='softmax'))  # output shape: (None, 10)

But if you are trying to classify each image within the sequences, then you can simply reapply the classifier using TimeDistributed:

x = Input(shape=(300, 400, 8))
y = GlobalAveragePooling2D()(x)
y = Dense(10, activation='softmax')(y)
classifier = Model(inputs=x, outputs=y)

x = Input(shape=(data1.shape[0], data1.shape[1], data1.shape[2], 1))
y = ConvLSTM2D(16, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(x)
y = ConvLSTM2D(8, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same',
               return_sequences=True)(y)
y = TimeDistributed(classifier)(y)  # output shape: (None, 8, 10)

model = Model(inputs=x, outputs=y)

Finally, take a look at the examples in keras repository. There's one for a generative model using ConvLSTM2D.

Edit: to estimate data2 from data1...

If I got it right this time, X_train should be 1 sample of a stack of 8 (300, 400, 1) images, not 8 samples of a stack of 1 image of shape (300, 400, 1).
If that's true, then:

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)

Should be updated to:

X_train = data1.reshape(1, data1.shape[0], data1.shape[1], data1.shape[2], 1)
Y_train = data2.reshape(1, data2.shape[0], data2.shape[1], data2.shape[2], 1)

Also, accuracy doesn't usually make sense when your loss is mse. You can use other metrics such as mae.

Now you just need to update your model to return sequences and to have a single unit in the last layer (because the images you are trying to estimate have a single channel):

model = Sequential()
input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1)
model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     input_shape=input_shape,
                     return_sequences=True))
model.add(ConvLSTM2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same',
                     return_sequences=True))

model.compile(loss='mse', optimizer='adam')

After that, model.fit(X_train, Y_train, ...) will start training normally:

Using TensorFlow backend.
(8, 300, 400)
(8, 300, 400)
Epoch 1/10

1/1 [==============================] - 5s 5s/step - loss: 2993.8701
Epoch 2/10

1/1 [==============================] - 5s 5s/step - loss: 2992.4492
Epoch 3/10

1/1 [==============================] - 5s 5s/step - loss: 2991.4536
Epoch 4/10

1/1 [==============================] - 5s 5s/step - loss: 2989.8523

Estimating high resolution images from lower ones using a Keras model based on ConvLSTM2D

Tags:

tensorflow

keras

scikit-learn

Roman

1 Answers

ldavid

Recent Activity

Donate For Us

Estimating high resolution images from lower ones using a Keras model based on ConvLSTM2D

Tags:

tensorflow

keras

scikit-learn

Roman

1 Answers

ldavid

Related questions

Recent Activity

Donate For Us